Physiology-Driven Inference Using Large Language Models Enables Probabilistic Assessment of Huntington’s Disease from Smartphone Eye-Movement Data
Leonardo Eleuterio Ariello, Kelvin Wang, David Newman-Toker, Jee Bang, David P. W. RastallBackground: Artificial intelligence in medicine has largely relied on supervised training of disease-specific models, limiting scalability in conditions where labeled data are scarce. Large language models (LLMs), which encode broad medical knowledge through large-scale pretraining, offer an alternative paradigm in which structured physiological measurements can be interpreted directly without task-specific model training. Objective: To evaluate whether smartphone-derived ocular motor biomarkers can be translated into clinically meaningful probabilistic assessments of Huntington’s disease (HD) using general-purpose LLMs operating as inference engines. Methods: In this prospective proof-of-concept study, 26 participants (13 with genetically confirmed HD and 13 age-matched controls) completed a standardized ocular motor assessment using a custom smartphone application. Quantitative eye-movement metrics were validated against expert neurologist ratings. Structured physiological features were then provided to four general-purpose LLMs without task-specific training or diagnostic labels, and the models generated an AI-Assigned HD Probability Score (HAIPS). Discriminative performance and associations with clinical severity measures were evaluated. Results: Smartphone-derived ocular motor metrics showed strong agreement with clinician assessments (Spearman ρ = 0.76–0.95; all p < 0.001), confirming preservation of clinically meaningful physiological signals. LLM-derived HAIPS distinguished HD from controls with high accuracy (AUC 0.879–0.944), with no significant differences across models. Discrimination was statistically equivalent to a supervised logistic regression model trained on the same features. HAIPS correlated strongly with established measures of disease severity, including cognitive (MoCA, ρ = −0.86), functional (TFC, ρ = −0.74), and motor impairment (UHDRS, ρ = 0.85) (all p ≤ 0.003). Conclusions: Structured ocular motor biomarkers acquired using a consumer smartphone can be translated into clinically meaningful probabilistic assessments of HD by general-purpose LLMs without disease-specific model training. These findings support a framework in which physiologically grounded digital biomarkers are coupled with general-purpose inference models, potentially enabling scalable assessment in rare neurological diseases where labeled data are limited.