DOI: 10.1210/clinem/dgad503 ISSN:

Artificial Intelligence Model Assisting Thyroid Nodule Diagnosis and Management: A Multicenter Diagnostic Study

Eun Ju Ha, Jeong Hoon Lee, Da Hyun Lee, Jayoung Moon, Haein Lee, You Na Kim, Minji Kim, Dong Gyu Na, Ji-hoon Kim
  • Biochemistry (medical)
  • Clinical Biochemistry
  • Endocrinology
  • Biochemistry
  • Endocrinology, Diabetes and Metabolism

Abstract

Purpose

To develop and validate a deep-learning-based AI model (AI-Thyroid) for thyroid cancer diagnosis, and to explore how this improve diagnostic performance.

Materials and Methods

The system was trained using 19,711 images of 6,163 patients in a tertiary hospital. It was validated using 11,185 images of 4,820 patients in 24 hospitals (test set 1) and 4,490 images of 2,367 patients in ____ (test set 2). The clinical implications were determined by comparing the findings of six physicians with different levels of experience (group 1: four trainees, and group 2: two faculty radiologists) before and after AI-Thyroid assistance.

Results

The area under the receiver operating characteristic (AUROC) curve of AI-Thyroid was 0.939. The AUROC, sensitivity, and specificity were 0.922, 87.0%, and 81.5% for test set 1 and 0.938, 89.9%, and 81.6% for test set 2. The AUROCs of AI-Thyroid did not differ significantly according to the prevalence of malignancies (> 15.0% vs. ≤ 15.0%, p = 0.226). In the simulated scenario, AI-Thyroid assistance changed the AUROC, sensitivity, and specificity from 0.854 to 0.945, from 84.2% to 92.7%, and from 72.9% to 86.6% (all p < 0.001) in group 1, and from 0.914 to 0.939 (p = 0.022), from 78.6% to 85.5% (p = 0.053) and from 91.9% to 92.5% (p = 0.683) in group 2. The interobserver agreement improved from moderate to substantial in both groups.

Conclusion

AI-Thyroid can improve diagnostic performance and interobserver agreement in thyroid cancer diagnosis, especially in less-experienced physicians.

More from our Archive