DOI: 10.1001/jamaophthalmol.2024.0045 ISSN: 2168-6165

Multinational External Validation of Autonomous Retinopathy of Prematurity Screening

Aaron S. Coyner, Tom Murickan, Minn A. Oh, Benjamin K. Young, Susan R. Ostmo, Praveer Singh, R. V. Paul Chan, Darius M. Moshfeghi, Parag K. Shah, Narendran Venkatapathy, Michael F. Chiang, Jayashree Kalpathy-Cramer, J. Peter Campbell
  • Ophthalmology


Retinopathy of prematurity (ROP) is a leading cause of blindness in children, with significant disparities in outcomes between high-income and low-income countries, due in part to insufficient access to ROP screening.


To evaluate how well autonomous artificial intelligence (AI)–based ROP screening can detect more-than-mild ROP (mtmROP) and type 1 ROP.

Design, Setting, and Participants

This diagnostic study evaluated the performance of an AI algorithm, trained and calibrated using 2530 examinations from 843 infants in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) study, on 2 external datasets (6245 examinations from 1545 infants in the Stanford University Network for Diagnosis of ROP [SUNDROP] and 5635 examinations from 2699 infants in the Aravind Eye Care Systems [AECS] telemedicine programs). Data were taken from 11 and 48 neonatal care units in the US and India, respectively. Data were collected from January 2012 to July 2021, and data were analyzed from July to December 2023.


An imaging processing pipeline was created using deep learning to autonomously identify mtmROP and type 1 ROP in eye examinations performed via telemedicine.

Main Outcomes and Measures

The area under the receiver operating characteristics curve (AUROC) as well as sensitivity and specificity for detection of mtmROP and type 1 ROP at the eye examination and patient levels.


The prevalence of mtmROP and type 1 ROP were 5.9% (91 of 1545) and 1.2% (18 of 1545), respectively, in the SUNDROP dataset and 6.2% (168 of 2699) and 2.5% (68 of 2699) in the AECS dataset. Examination-level AUROCs for mtmROP and type 1 ROP were 0.896 and 0.985, respectively, in the SUNDROP dataset and 0.920 and 0.982 in the AECS dataset. At the cross-sectional examination level, mtmROP detection had high sensitivity (SUNDROP: mtmROP, 83.5%; 95% CI, 76.6-87.7; type 1 ROP, 82.2%; 95% CI, 81.2-83.1; AECS: mtmROP, 80.8%; 95% CI, 76.2-84.9; type 1 ROP, 87.8%; 95% CI, 86.8-88.7). At the patient level, all infants who developed type 1 ROP screened positive (SUNDROP: 100%; 95% CI, 81.4-100; AECS: 100%; 95% CI, 94.7-100) prior to diagnosis.

Conclusions and Relevance

Where and when ROP telemedicine programs can be implemented, autonomous ROP screening may be an effective force multiplier for secondary prevention of ROP.

More from our Archive