Evaluation of inpatient medication guidance from an artificial intelligence chatbot
Jennifer Beavers, Ryan F Schell, Halden VanCleave, Ryan C Dillon, Austin Simmons, Huiding Chen, Qingxia Chen, Shilo Anders, Matthew B Weinger, Scott D Nelson- Health Policy
- Pharmacology
Abstract
Disclaimer
In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time.
Purpose
To analyze the clinical completeness, correctness, usefulness, and safety of chatbot and medication database responses to everyday inpatient medication-use questions.
Methods
We evaluated the responses from an artificial intelligence chatbot, a medication database, and clinical pharmacists to 200 real-world medication-use questions. Answer quality was rated by a blinded group of pharmacists, providers, and nurses. Chatbot and medication database responses were deemed “acceptable” if the mean reviewer rating was within 3 points of the mean rating for pharmacists’ answers. We used descriptive statistics for reviewer ratings and Kendall’s coefficient to evaluate interrater agreement.
Results
The medication database generated responses to 194 (97%) questions, with 88% considered acceptable for clinical correctness, 76% considered acceptable for completeness, 83% considered acceptable for safety, and 81% considered acceptable for usefulness compared to pharmacists’ answers. The chatbot responded to only 160 (80%) questions, with 85% considered acceptable for clinical correctness, 65% considered acceptable for completeness, 71% considered acceptable for safety, and 68% considered acceptable for usefulness.
Conclusion
Traditional search methods using a drug database provide more clinically correct, complete, safe, and useful answers than a chatbot. When the chatbot generated a response, the clinical correctness was similar to that of a drug database; however, it was not rated as favorably for clinical completeness, safety, or usefulness. Our results highlight the need for ongoing training and continued improvements to artificial intelligence chatbots for them to be incorporated reliably into the clinical workflow. With continued improvement in chatbot functionality, chatbots could be a useful pharmacist adjunct, providing healthcare providers with quick and reliable answers to medication-use questions.