DOI: 10.3390/bioengineering13070762 ISSN: 2306-5354

Active Learning Under Expert-Budget Constraints: A Human-in-the-Loop Pipeline for Diabetic Retinopathy Lesion Detection

Hyeok Kim, Seok-Min Chang, Bo-Young Lim, Soo Young Lee, Ho-Gil Jung

Early diagnosis of Diabetic Retinopathy (DR) is critical for preventing irreversible vision loss, but precise lesion annotation by ophthalmologists is the dominant cost in building any clinical-grade DR detection model. The structural problem in real hospital settings is not labeling cost per se, but expert availability: ophthalmologists’ time is bounded by clinical duties, so the active-learning (AL) cycle can iterate only a handful of times in practice. We frame this constraint explicitly and ask which AL designs work best under a tight expert budget. We propose Virtuous Cycle, a Human-in-the-Loop (HITL) pipeline that integrates (i) a YOLOv8x-based object detector for microaneurysms, hemorrhages, and exudates, (ii) four AL sampling strategies (Average Confidence, Random, Hybrid-Diversity, Monte Carlo Dropout), and (iii) an in-hospital annotation platform (Diavision Studio) in which clinicians refine AI pre-labels rather than draw from scratch. We evaluate Virtuous Cycle on a real-world fundus dataset from the National Medical Center (NMC) across eight AL rounds, expanding the labeled pool from 81 images (R0) to 481 images (R8) within the actual expert-time budget of two ophthalmologists. Across three independent random seeds, random sampling dominates at cold start (mean mAP@50 0.14→0.25 over R0–R1), whereas Hybrid-Diversity converges to the highest mAP@50, Precision, and Recall by R7 (431 images; mAP@50 0.40, Precision 0.55, Recall 0.41), with MC Dropout close behind; by R8, the labeled pool is exhausted and all strategies converge to the same final model. A clinician crossover analysis of 36 paired clinical images, controlling for per-clinician speed bias and per-image difficulty bias, shows no statistically significant difference in overall per-image labeling time between AI-assisted and manual annotation (p=0.52), but a statistically significant increase in confirmed lesion detections under AI assistance (p=0.0058), driven predominantly (84–100% of the net increase) by microaneurysms, the lesion type most prone to being missed unaided. The results indicate that, under expert-budget constraints, AL strategy choice should be staged: random sampling for cold start, uncertainty-and-diversity sampling once the model has matured, and that AI assistance trades a modest, lesion-burden-dependent time cost for a measurable gain in the sensitivity of microaneurysm detection.

More from our Archive