DOI: 10.1200/jco.2026.44.19_suppl.23 ISSN: 0732-183X

Automated tumor content analysis from whole slide images using deep learning models for NGS quality control in precision oncology.

Satya Prakash Khuntia, Hitesh Goswami, Kshitij Rishi, Shefali Karve, Puneet Pantane, Prabakar Sampath, Anjali Kulkarni, Praveen Kumar Jha

23

Background: Accurate tumor cell abundance (TCA) estimation is critical for ensuring sample adequacy prior to next-generation sequencing (NGS) in precision oncology workflows. Current standard practice requires manual pathologist review of hematoxylin and eosin stained whole slide images (WSI), creating throughput bottlenecks and inter-observer variability. We developed and evaluated a series of lightweight CNN models for cancer-agnostic TCA estimation to automate NGS quality control. Methods: Over 1,000 WSIs spanning multiple tumor types were processed using Otsu-based tissue segmentation. Tile-level pseudo-labels were generated using an ensemble of CellViT models, with slide-level TCA computed as the minimum of area-based and cell-based estimates across the ensemble. All slide-level TCA labels were independently validated by board-certified pathologists, with discordant cases reviewed and reconciled prior to model training. Four CNN architectures were evaluated: MobileNetV3, GoogLeNet, ResNet, and DenseNet. Models were trained via knowledge distillation from the CellViT ensemble and validated against pathologist-confirmed TCA estimates. Performance was assessed using MAE, AUC, and clinical concordance at a 20% TCA threshold for NGS adequacy determination. Results: All four models achieved AUC greater than 0.91 for NGS adequacy classification at the 20% TCA threshold, with MAE under 7% against pathologist-confirmed reference standards across all tumor types evaluated. Higher capacity architectures demonstrated marginal accuracy gains while MobileNetV3 achieved comparable performance at significantly faster inference speed, processing a complete WSI in under 5 seconds. Conclusions: Lightweight CNN models trained via knowledge distillation from a CellViT ensemble and pathologist-confirmed labels can achieve clinically acceptable cancer-agnostic TCA estimation suitable for automated NGS QC workflows. Compared to large Vision Transformer models requiring substantial computational resources, the distilled CNN models offer dramatically reduced inference cost while preserving clinical-grade accuracy. These models have potential to substantially reduce pathologist workload in molecular pathology laboratories while maintaining accuracy required for NGS adequacy determination, with direct applicability to precision oncology treatment selection pipelines and broad accessibility across resource-varied clinical settings.

More from our Archive