DOI: 10.1002/ca.24271 ISSN: 0897-3806

Can ChatGPT Generate Acceptable Case‐Based Multiple‐Choice Questions for Medical School Anatomy Exams? A Pilot Study on Item Difficulty and Discrimination

Yavuz Selim Kıyak, Ayşe Soylu, Özlem Coşkun, Işıl İrem Budakoğlu, Tuncay Veysel Peker

ABSTRACT

Developing high‐quality multiple‐choice questions (MCQs) for medical school exams is effortful and time‐consuming. In this study, we investigated the ability of ChatGPT to generate case‐based anatomy MCQs with acceptable levels of item difficulty and discrimination for medical school exams. We used ChatGPT to generate case‐based anatomy MCQs for an endocrine and urogenital system exam based on a framework for artificial intelligence (AI)‐assisted item generation. The questions were evaluated by experts, approved by the department, and administered to 502 second‐year medical students (372 Turkish‐language, 130 English‐language). The items were analyzed to determine the discrimination and difficulty indices. The item discrimination indices ranged from 0.29 to 0.54, indicating acceptable differentiation between high‐ and low‐performing students. All items in Turkish (six out of six) and five out of six in English met the higher discrimination threshold (≥ 0.30) required for large‐scale standardized tests. The item difficulty indices ranged from 0.41 to 0.89, most items falling within the moderate difficulty range (0.20–0.80). Therefore, it was concluded that ChatGPT can generate case‐based anatomy MCQs with acceptable psychometric properties, offering a promising tool for medical educators. However, human expertise remains crucial for reviewing and refining AI‐generated assessment items. Future research should explore AI‐generated MCQs across various anatomy topics and investigate different AI models for question generation.

More from our Archive