Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam
Arushi P. Mahajan, Christina L. Shabet, Joshua Smith, Shannon F. Rudy, Robbi A. Kupfer, Lauren A. Bohm- Otorhinolaryngology
- Surgery
Abstract
Objectives
This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners.
Study Design and Setting
All available questions from a public, paid‐access question bank were manually input through ChatGPT.
Methods
Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations.
Results
Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy.
Conclusion
Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.