DOI: 10.1200/jco.2025.43.5_suppl.425 ISSN: 0732-183X

RadOnc-GPT (gpt-4o) versus human data extraction for prostate cancer clinical research.

Mohammad Javad Namazi, Mariana Borras Osorio, Jason M Holmes, David M. Routman, Daniel Ebner, Peilong Wang, Wei Liu, Mark Raymond Waddle

425

Background: Prostate cancer (PC) ranks among the most prevalent cancers in men, with incidence rising due to factors like aging populations and enhanced screening techniques. Effective data extraction from clinical records is crucial for advancing research and improving patient care; however, traditional manual methods can be time-consuming and susceptible to mistakes. This study seeks to evaluate RadOnc-GPT against human annotation for extracting essential clinical information from prostate cancer clinical records. Methods: RadOnc-GPT is a retrieval augmented generation (RAG)-based chatbot with access to a variety of patient data including treatment details, radiology reports, and clinical notes. RadOnc-GPT uses Open AI’s gpt-4o model. Patients who received curative intent radiotherapy as initial treatment for PC in our institution between 2014 and 2023 were randomly selected for evaluation. Relevant Clinical factors were extracted by a human expert annotator on our on-site RedCap database. The extraction of the same variables was performed using RadOnc-GPT. A comparison between the ground truth (human annotation) and RadOnc-GPT, to evaluate the performance of the AI system, was carried out including accuracy, recall and F1 score. Results: A total of 382 PC patients were included in the cohort for data extraction. The clinical variables extracted were the treatment intent, NCCN risk category, T and N stage, Gleason score, radiation volume, and Androgen Deprivation therapy (ADT) use. The overall performance of RadOnc-GPT was excellent, with the highest accuracy achieved for treatment intent (99.7%), Gleason score (98.4%), and Treatment Volume (90.1-99.5%). Conclusions: RadOnc-GPT performed excellent at extracting relevant clinical details from the electronic medical record and radiation oncology information systems with dramatic reduction in time and cost compared with manual review. LLMs could be highly useful for large-scale data extraction in PC research, offering a promising solution for automating the accurate data extraction process.

Correct (n)
Total (n)
Accuracy
Recall
F1 score
RT with curative intent
381
382 99.7 99.9 99.9
Prostate Cancer Risk Category
345 382 90.3 90.4 95.1
Primary tumor staging (T) at diagnosis
368 382 96.3 96.3 97.9
Regional nodal status (N) at diagnosis
352 382 92.1 92.2 95.9
Gleason Score Primary Secondary Total
376372376 382382382 98.497.498.4 98.597.498.5 99.298.799.2
Treated Areas Prostate Pelvic Nodes Para-Aortic Nodes Distant Site
380344378372 382382382382 99.590.19997.4 99.490.199.197.4 99.79599.598.7
ADT Use with RT
365 382 95.5 95.7 97.8

More from our Archive