Improving dementia phenotype prediction with functional mapping of genetic data in non‐European ancestries
Mingzhou Fu, Bogdan Pasaniuc, Keith Vossel, Timothy S Chang- Psychiatry and Mental health
- Cellular and Molecular Neuroscience
- Geriatrics and Gerontology
- Neurology (clinical)
- Developmental Neuroscience
- Health Policy
- Epidemiology
Abstract
Background
Many common genetic variants have been associated with dementia, but their ability to predict dementia prospectively is unknown, especially in non‐European ancestral populations. We sought to optimize dementia risk prediction models with machine learning in individuals of diverse genetic ancestries.
Method
Samples were selected from the University of California Los Angeles Health System with their electronic health records (EHRs) and linked genetic data. Dementia phenotype was defined using diagnosis codes and mapped phecodes. Polygenic risk scores (PRSs) were built using summary statistics of different population‐based genome‐wide association studies (GWASs). Independent significant and functionally mapped single nucleotide polymorphisms (SNPs) were identified using the FUMA tool. Logistic regression models were built with 1) APOE status, 2) a single PRS, or 3) multiple PRSs. The model with age and sex only was built as the baseline model. We trained LASSO models to determine dementia risk SNPs from weighted SNP‐level genetic data. We compared the predictive performance of these models after five‐fold cross‐validation with area under precision‐recall curve (AUPRC) and area under receiver operating characteristic (AUC). Finally, identified risk SNPs were mapped to genes for biological interpretation.
Result
There were 975 individuals included in the final Hispanic Latinx American ancestry sample, of which 9.44% (N = 92) were dementia cases. The model with LASSO‐selected functional mapped SNPs from multiple neurodegenerative diseases GWASs outperformed all other genetic risk score models (AUPRC: 0.39, AUROC: 0.87), resulting in a 19.6% increase in AUPRC from the best PRS model, and a 26.9% increase from the baseline model. The best‐performing LASSO model included nine SNPs, mostly located in the 19q13, 4p15, and 17q21 regions. Results of mapped genes not only revealed well‐known risk genes of dementia but identified risk genes from other neurodegenerative diseases.
Conclusion
We demonstrated that building genetic risk scores with trans‐ancestry GWAS and selecting potential causal SNPs can be beneficial for dementia prediction in non‐European ancestry populations. Further training using machine learning methods with functionally mapped SNP‐level genetic data showed significant improvement in predictive power compared to PRS models. Genetic risks of multiple neurodegenerative diseases may contribute to the predisposition to dementia.