DOI: 10.1093/radadv/umag028 ISSN: 2976-9337

Multicenter AI-versus Expert-Assisted RECIST Target Lesion Measurements in Follow-Up Body CT of Cancer Patients

Max J J de Grauw, Max Westphal, Ewoud J Smit, Ernst Th Scholten, Tanja Loßau, Jan Moltz, Silvia Bottazzi, Renato Cuocolo, Anna D’Angelo, Ajo B George, Hugo C van Heusden, Adriano Liguori, Valentina Longo, Luigi Mannacio, Nguyen T N Minh, Ahmed E Othman, Andrea Ponsiglione, Joey Roosen, Maarten de Rooij, Luca Russo, Steven Schalekamp, Miranda Snoeren, Arnaldo Stanzione, Sebastian Steinmetz, Carlijn I Verkroost, Bastiaan Vernhout, Lina Xu, Derya Yakar, Matthieu J C M Rutten, Bram van Ginneken, Mathias Prokop, Alessa Hering

Abstract

Background

Manual lesion measurements remain the standard for assessing oncologic treatment response, despite being time-consuming and prone to substantial inter-reader variability, which may lead to inconsistent response classification and subsequent variability in treatment decisions.

Purpose

To evaluate the impact of an artificial intelligence (AI) system for assisted lesion measurement on reading time and measurement consistency in follow-up CT examinations using the Response Evaluation Criteria in Solid Tumors (RECIST 1.1).

Methods and Materials

In this retrospective reader study, follow-up chest-abdomen-pelvis CT examinations from 212 oncology patients collected at two Dutch hospitals were assessed by 23 readers (15 radiologists, 8 residents) recruited from 11 international institutions under three conditions: unassisted, AI-assisted, and expert-assisted (using a prior radiologist’s unassisted measurement). To prevent bias related to the source of the measurements, readers were informed that all support was AI-derived. Primary outcomes were reading time to completion and inter-observer measurement variability. For each outcome, a Bayesian generalized linear mixed model was used to analyze the results.

Results

AI assistance significantly reduced per-patient reading time versus unassisted reading (-36.0 s; 95% CI: -53.0, -22.1). At the lesion level, it was associated with a small increase in variability relative to the expert-derived reference standard (1.32 mm; 95% CI: 0.83, 1.91). At the patient level, AI assistance did not meaningfully affect change in the sum of longest diameters (SLD; -0.38 mm; 95% CI: -1.57, 0.72), but increased RECIST outcome agreement by 7.7% (95% CI: 2.8, 12.7) compared with unassisted reading. Expert-assisted reading yielded even higher inter-reader agreement (13.3%; 95% CI: 8.6, 18.1).

Conclusion

AI assistance reduced reading time and improved patient-level RECIST agreement, despite a small increase in lesion-level measurement variability. These findings suggest that AI-assisted RECIST assessment may improve workflow and response classification consistency, while also providing a benchmark from expert-assisted reading for future AI development.

More from our Archive