DOI: 10.1002/oto2.98 ISSN: 2473-974X

Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam

Arushi P. Mahajan, Christina L. Shabet, Joshua Smith, Shannon F. Rudy, Robbi A. Kupfer, Lauren A. Bohm
  • Otorhinolaryngology
  • Surgery



This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners.

Study Design and Setting

All available questions from a public, paid‐access question bank were manually input through ChatGPT.


Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations.


Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy.


Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.

More from our Archive