DOI: 10.1200/jco.2026.44.19_suppl.6 ISSN: 0732-183X

Patient preference for large language model (LLM)–optimised oncology trial information: A randomised controlled crossover study.

Minh Tran, Kate Saw, Jeremy Mo, Emma-Kate Carson, Kathryn MacDonald, Mary Lloyd, Graham Rossiter, Marisa Crawford, Carolyn Mazariego, Ann Dadich, Peta Brydon, Jo River, Rachel Fitz-Gerald Dear, Elgene Lim, Frank Po-Yen Lin

6

Background: Clinical trial participation in oncology is frequently hindered by complex, jargon-laden information that impairs comprehension by patients and caregivers. LLMs offer a scalable solution to simplify technical text, but their utility in oncology has not been evaluated. Methods: We conducted a randomised, controlled, three-period crossover study at two tertiary breast cancer outpatient clinics in Sydney, Australia. Eligible patients and caregivers evaluated trial descriptions across 3 formats: standard ClinicalTrials.gov text (Control); LLM-optimised text generated using zero-shot prompting with GPT-4 (LLM); and further refined by an oncologist (LLM+E). Latin square randomisation controlled for order effects. The primary endpoint was the Global Preference Score (GPS), operationalised as the minimum Likert rating (1-5 scale) across 5 sections (Title, Summary, Intervention, Description, Eligibility) to reflect that incomprehensibility of any component degrades the overall document utility. Sample size (n≥18) was determined via Monte Carlo simulation to detect a minimum 1 point Likert shift with 90% power (α=0.05); the recruitment target was 36 (accounting for 50% attrition). Primary analysis employed Friedman rank sum test blocked by participant, with a cumulative link mixed model (CLMM) including random intercepts for participants to adjust for age, education, and first language. All LLM generated text (LLM±E) were vetted for accuracy of content by ≥1 medical oncologist. Results: Between September and December 2025, 30 of 31 recruited participants provided crossover responses for primary analysis (401 valid responses across 5 sections, 11% invalid). The mean age was 53 years (95% CI, 44-63), with the majority of respondents being female (n = 27, 90%), native English speakers (n = 24, 80%) and holders of tertiary qualifications (n = 17, 57%); 20 respondents were patients (67%). The median GPS was 3.0 (IQR 2.5) for Control, 4.0 (IQR 2.0) for LLM and 3.0 (IQR 1.5) for LLM+E. LLM achieved a significantly higher GPS vs. Control (p = 0.02, Friedman test). Multivariable CLMM confirmed increased odds of higher preference ratings for LLM text vs. Control (OR 1.83, 95% CI 1.01–3.35, adjusted p = 0.048). Paradoxically, LLM+E showed no improvement over Control (padj = 0.826, r = 0.063). Native English speakers rated contents more critically than non-native speakers (OR 0.092, p = 0.009), independent of arm allocation. Conclusions: LLM optimisation improves patient preference for trial information compared with standard registry descriptors, supporting further evaluation of its use in rendering patient-facing materials. Expert oncologist revision may inadvertently re-introduce complexity, negating the linguistic accessibility gains provided by LLMs, suggesting that human-in-the-loop workflows need caution to preserve linguistic style for accessibility.

More from our Archive