Diagnostic Performance of Dental Professionals and a Vision-enabled Artificial Intelligence Model (ChatGPT-4o) in Radiographic Detection of Apical Lesions
Hanadi SabbanBackground:
This study evaluated the diagnostic performance of ChatGPT-4o, a vision-capable large language model, in identifying apical periodontitis on periapical radiographs. Its accuracy was compared to that of board-certified oral radiologists and endodontists. In addition, the study examined how clinical and radiographic co-factors influenced diagnostic outcomes.
Materials and Methods:
In this retrospective cross-sectional diagnostic accuracy study, 166 periapical radiographs were independently assessed by four reader groups: Board-certified oral radiologists, endodontists, baseline ChatGPT-4o, and ChatGPT-4o applied to single-tooth cropped images. Outcomes were binary (lesion present/absent). Metrics included overall accuracy, inter-rater agreement (Fleiss’ κ), area under the receiver operating characteristic curve, and multivariable logistic regression. Covariates were radiographic quality, root morphology, crestal bone loss, and tooth position. Images originated from a university dental hospital archive under institutional oversight.
Results:
Oral radiologists and endodontists showed higher diagnostic accuracy (57.1% and 56.8%) than ChatGPT-4o using cropped images (41.7%) or standard input (36.3%) (
Conclusion:
ChatGPT-4o performed below clinical experts for detecting apical lesions on periapical radiographs. Image cropping improved model results but did not reach expert performance. The study indicates that both radiographic errors and anatomical complexity substantially limit diagnostic accuracy.