RadOnc-GPT (gpt-4o) versus human data extraction for prostate cancer clinical research.

doi:10.1200/jco.2025.43.5_suppl.425

DOI: 10.1200/jco.2025.43.5_suppl.425 ISSN: 0732-183X

RadOnc-GPT (gpt-4o) versus human data extraction for prostate cancer clinical research.

Mohammad Javad Namazi, Mariana Borras Osorio, Jason M Holmes, David M. Routman, Daniel Ebner, Peilong Wang, Wei Liu, Mark Raymond Waddle

Show PDF Cite

425

Background: Prostate cancer (PC) ranks among the most prevalent cancers in men, with incidence rising due to factors like aging populations and enhanced screening techniques. Effective data extraction from clinical records is crucial for advancing research and improving patient care; however, traditional manual methods can be time-consuming and susceptible to mistakes. This study seeks to evaluate RadOnc-GPT against human annotation for extracting essential clinical information from prostate cancer clinical records. Methods: RadOnc-GPT is a retrieval augmented generation (RAG)-based chatbot with access to a variety of patient data including treatment details, radiology reports, and clinical notes. RadOnc-GPT uses Open AI’s gpt-4o model. Patients who received curative intent radiotherapy as initial treatment for PC in our institution between 2014 and 2023 were randomly selected for evaluation. Relevant Clinical factors were extracted by a human expert annotator on our on-site RedCap database. The extraction of the same variables was performed using RadOnc-GPT. A comparison between the ground truth (human annotation) and RadOnc-GPT, to evaluate the performance of the AI system, was carried out including accuracy, recall and F1 score. Results: A total of 382 PC patients were included in the cohort for data extraction. The clinical variables extracted were the treatment intent, NCCN risk category, T and N stage, Gleason score, radiation volume, and Androgen Deprivation therapy (ADT) use. The overall performance of RadOnc-GPT was excellent, with the highest accuracy achieved for treatment intent (99.7%), Gleason score (98.4%), and Treatment Volume (90.1-99.5%). Conclusions: RadOnc-GPT performed excellent at extracting relevant clinical details from the electronic medical record and radiation oncology information systems with dramatic reduction in time and cost compared with manual review. LLMs could be highly useful for large-scale data extraction in PC research, offering a promising solution for automating the accurate data extraction process.

Correct (n)

Total (n)

Accuracy

Outline

RadOnc-GPT (gpt-4o) versus human data extraction for prostate cancer clinical research.

More from our Archive