DOI: 10.3390/diagnostics16132025 ISSN: 2075-4418

Development and Validation of an Interpretable Machine Learning Model Based on Routine Blood Biomarkers: For Predicting Age-Related Hearing Loss

Dan He, Yiting Liu, Jing Ke, Xu Jiang, Haiyu Ma, Ya Shi, Wei Yuan

Background/Objectives: Age-related hearing loss (ARHL) is a common sensory impairment in the elderly, and its early prediction and intervention are crucial for improving the quality of life in older adults. This study aims to develop and validate an interpretable machine learning model based on routine blood biomarkers to predict the risk of ARHL occurrence. Methods: A total of 542 participants were selected from the National Health and Nutrition Examination Survey (NHANES) database, including 271 ARHL patients and 271 healthy controls. The samples were randomly divided into a training set (50%) and two independent internal validation sets (25% each). Through systematic comparison of 113 machine learning algorithm combinations, the optimal predictive model (glmBoost+Stepglm[forward]) was constructed, and the SHAP method was employed for feature interpretation. To evaluate the model’s generalization ability, external validation was further performed using a cohort of 92 cases from Chongqing People’s Hospital. Additionally, an openly accessible interactive prediction web page was developed based on the R Shiny framework, supporting real-time clinical risk assessment and visual interpretation. Results: The model achieved an AUC of 0.948 in the training set, with AUCs of 0.893 and 0.945 in two internal validation sets, respectively, and an overall accuracy rate of 86.3%. In the external validation cohort (albeit with a limited sample size of 92 from a single center), the model maintained good performance with an AUC of 0.839 (95% CI: 0.750–0.918) and an accuracy of 77.2%. The model identified nine key predictive features, with the top three being glycated hemoglobin (HbA1c), mean corpuscular volume (MCV), and blood glucose according to SHAP interpretability analysis. Conclusions: This study successfully developed and validated an interpretable machine learning model based on routine blood biomarkers for community-based risk stratification of age-related hearing loss. The model demonstrated robust performance in internal and external validations, including an age-matched elderly subgroup. An interactive web tool was developed to facilitate real-time risk assessment. While the model is intended as a prescreening tool for large-scale populations rather than a diagnostic test for age-matched individuals, it provides a novel approach for early identification of individuals at higher risk of ARHL and offers insights into its systemic pathogenesis.

More from our Archive