DOI: 10.1136/bmjhci-2025-101803 ISSN: 2632-1009

Alzheimer’s disease risk prediction from clinical and social determinants of health: a machine learning cohort study in UK Biobank

Junming Hu, Simon Lu, Qi Zhang, Kehan Qian, Habbiburr Rehman, Congcong Zhu, John Farrell, Julio E Castrillon-Candas, Rhoda Au, Lindsay A Farrer, Wei Q Qiu, Jinying Chen, Xiaoling Zhang

Objectives

Social determinants of health (SDOH) may improve Alzheimer’s disease (AD) risk prediction by capturing upstream contextual risk beyond routinely measured clinical variables. We aimed to develop and validate an accurate, interpretable machine-learning pipeline for AD risk prediction in UK Biobank using routinely collected data.

Methods

Using data from 13 076 participants in the UK Biobank, we developed an automated machine-learning pipeline for AD risk prediction with feature selection and a C5.0 boosted-tree classifier. Data were split into training, development and test sets (7:2:1); missing values were imputed in the training data only, and feature selection, tuning and threshold calibration were performed using the training/development data, with final evaluation on the independent test set. Internal validation used repeated subsampling without replacement.

Results

During up to 16 years of follow-up, 927 participants developed AD. Feature selection reduced 3590 variables to 26 predictors spanning age, APOE4, SDOH, medical history and routine clinical measures. The final model showed good discrimination (area under the precision–recall curve 0.89) and adequate calibration (Hosmer-Lemeshow p=0.71), with stable performance under repeated subsampling. Sex-stratified models showed similar patterns.

Discussion

SDOH contributed useful predictive information, but their associations should be interpreted as predictive rather than causal and may reflect socioeconomic confounding and healthcare access.

Conclusions

This model could support scalable AD risk screening using routinely collected data, but external validation and recalibration in non-UK populations are needed before broader application.

More from our Archive