DOI: 10.1002/oto2.98 ISSN: 2473-974X

Assessment of Artificial Intelligence Performance on the Otolaryngology Residency In‐Service Exam

Arushi P. Mahajan, Christina L. Shabet, Joshua Smith, Shannon F. Rudy, Robbi A. Kupfer, Lauren A. Bohm
  • Otorhinolaryngology
  • Surgery

Abstract

Objectives

This study seeks to determine the potential use and reliability of a large language learning model for answering questions in a sub‐specialized area of medicine, specifically practice exam questions in otolaryngology–head and neck surgery and assess its current efficacy for surgical trainees and learners.

Study Design and Setting

All available questions from a public, paid‐access question bank were manually input through ChatGPT.

Methods

Outputs from ChatGPT were compared against the benchmark of the answers and explanations from the question bank. Questions were assessed in 2 domains: accuracy and comprehensiveness of explanations.

Results

Overall, our study demonstrates a ChatGPT correct answer rate of 53% and a correct explanation rate of 54%. We find that with increasing difficulty of questions there is a decreasing rate of answer and explanation accuracy.

Conclusion

Currently, artificial intelligence‐driven learning platforms are not robust enough to be reliable medical education resources to assist learners in sub‐specialty specific patient decision making scenarios.

More from our Archive