Potential and Pitfalls of Multimodal Large Language Models in Cerebral Palsy Hip Surveillance: A Radiographic Interpretation Study Assessing Educational Utility
Yman Kamgaing Wappi, Austin Cheng, Alexander Dymond, Soroush Baghdadi, William OppenheimBackground/Objectives: Cerebral palsy (CP) hip displacement requires longitudinal surveillance, frequently imposing significant burden on caregivers. While Multimodal Large Language Models (MLLMs) offer a potential solution to the health literacy gap, their accuracy in interpreting pediatric pelvic radiographs remains unproven. This study evaluates the effectiveness and safety of MLLMs in addressing caregiver concerns regarding CP hip management. Methods: Fifteen deidentified pediatric pelvic radiographs representing a spectrum of hip displacement severities were processed through three MLLMs: GPT-4o, Claude 3.5, and Gemini 1.5 Pro. Nine standardized caregiver prompts (n = 95 total responses per model) were utilized to simulate common clinical queries. Outcome measures included response word count, interactive characteristics, frequency of medical disclaimers, and diagnostic accuracy. Results: Quantitative analysis revealed that Claude 3.5 produced significantly shorter responses compared to other models (p < 0.01). GPT-4o demonstrated the highest safety alignment, with a 96.9% disclaimer rate, significantly exceeding Claude (60.0%) and Gemini (76.8%) (p = 0.03). Diagnostic “hallucinations” were observed, notably Claude misidentifying non-operative cases as bilateral hip replacements. While management recommendations were clinically relevant, they remained generic rather than patient-specific, failing to measure or apply migration percentage thresholds. Encouragingly, all models consistently directed users to consult an orthopaedic surgeon. Conclusions: MLLMs represent an opportunity to enhance health literacy by providing accessible management summaries and emphasizing professional consultation. However, significant radiographic hallucinations and a lack of specific, evidence-based guidance preclude their use as standalone diagnostic tools. Currently, MLLMs should be viewed as educational adjuncts requiring expert oversight in the pediatric orthopaedic care continuum.