Abstract 16722: Performance Evaluation of ChatGPT 4.0 on Cardiovascular Clinical Cases From the USMLE Step 2CK and Step 3 of the National Board of Medical Examiners
Joseph Kassab, Christopher Massad, Vishwum Kapadia, Abdel Hadi El Hajjar, Joseph El Dahdah, Michel Chedid El Helou, Elio Haroun, Jay Ramchand, Serge C Harb- Physiology (medical)
- Cardiology and Cardiovascular Medicine
Introduction: As the prominence of online chat-based artificial intelligence (AI) platforms increases, medical students and residents in training may be inclined to utilize these tools as supplementary resources while preparing for their United States Medical Licensing Board (USMLE) examinations.
Hypothesis: The aim of this study was to assess the performance of ChatGPT 4.0 in responding to cardiovascular (CV) clinical vignettes from USMLE sample questions provided by the National Board of Medical Examiners (NBME).
Methods: We extracted the 31 CV clinical vignettes from the official NBME USMLE Step 2 Clinical Knowledge and Step 3 self-assessments, and submitted them to ChatGPT 4.0. The clinical cases encompassed a range of topics such as preventive medicine, congenital heart disease, valvular heart disease, electrophysiology, heart failure, and pericardial disease. Each of the 31 cases had a MCQ: 29/31 had at least 5 answer choices, while the remaining 2 had 4 answer choices each. The performance of the AI model was assessed using the NBME-provided answers. Subsequently, two experienced cardiologists independently reviewed the explanations provided by the model for each case and classified them as “accurate” - if they aligned with appropriate guidelines, and as “inaccurate” if they conveyed incorrect or inappropriate information.
Results: The AI model attained a grade of 90.3% (28/31). The initial agreement between the 2 reviewers was near perfect with a Cohen’s Kappa of 0.85. Any identified discrepancies were subsequently addressed and resolved. Of the correctly answered questions, accurate explanations were provided by the AI model in 89.2% of cases (25/28).
Conclusions: The high performance of ChatGPT 4.0, suggests it could be an effective additional CV study resource for residents and medical students preparing for the USMLE Step 2 CK and Step 3 exams.