ID #995 Automated Segmentation Performance and Uncertainty in Pediatric Diffuse Midline Gliomas using Imaging Biomarkers

doi:10.1093/neuped/wuag026.436

DOI: 10.1093/neuped/wuag026.436 ISSN: 2977-4454

ID #995 Automated Segmentation Performance and Uncertainty in Pediatric Diffuse Midline Gliomas using Imaging Biomarkers

Daria Laslo, Nina Baumgartner, Laura Fontanesi, Dror Suhami, Nabaan Mir, Atlas Haddadi Avval, Deep Gandhi, Ariana Familiar, Anahita Kazerooni, Zhifan Jiang, Abhijeet Parida, Marius Linguraru, Benjamin Kann, Sabine Mueller, Arzu Coeltekin, Catherine Jutzeler, Andreas Rauschecker, Sarah Brüningk

Show PDF Cite

Abstract

Background

MRI-based tumor segmentation could greatly support clinical assessment of diffuse midline glioma (DMG), yet translation of automated methods remains constrained by occasional model failures, as the performance required for clinical utility and the value of uncertainty estimates in detecting meaningful errors remain unclear. We systematically evaluate segmentation performance prediction, response label stability, and uncertainty estimation.

Methods

Whole tumor was segmented in a multicentric, international cohort of pre- and post-therapy multi-contrast MRIs (n = 403) of 107 DMG patients. Segmentations by a state-of-the-art deep learning model were dichotomized by Dice score into acceptable (Dice>0.8) and poor (Dice<0.8). We analyzed segmentation performance classification from image-derived features (imaging metadata, radiomic features, 3D brain MRI foundation model embeddings), and response assessments stemming from manual vs. automated segmentations (n = 51 patients with longitudinal follow-up). Using eyetracking, in a sub-study, we further quantified human segmentor (36 annotators) contour uncertainty (12 slices) contextualized with observer gaze patterns.

Results

Despite generally good performance (median Dice=0.77-0.81), auto-segmented volumes altered 20% of trajectory-based manual response labels (n = 10), predominantly misclassifying stable/progressive disease as partial response due to undersegmentation of post-treatment scans. Segmentation performance was best classified using a combination of whole image foundation model embeddings and segmented tumor volume (ROCAUC=0.81±0.05). Segmentation error correlated (|r|=0.9) with human contour uncertainty, supporting model-based uncertainty as a proxy for annotation difficulty. Image-derived attention features from deeper encoder layers explained substantially more uncertainty variance than eye-tracking features alone (R²: 24% vs. 2%). Human gaze attention overlapped most with U-Net bottleneck activations (Dice=0.6). A combined model integrating model attention and human visual behavior explained 39% of uncertainty variance.

Conclusions

Jointly, these results support the integration of performance- and uncertainty-aware segmentation frameworks to enable safe clinical deployment, scalable quality assurance, and reliable endpoint extraction from automated tumor segmentations in DMG.