DOI: 10.1097/md.0000000000049402 ISSN: 0025-7974

An interpretable predictive model for depression risk in diabetic patients: A web-based application using NHANES data

Yishi Li, Tong Ren, Guanghong Zhou, Zhi Li, Chunyan Hu, Junfeng Zhao, Tianlin Guo, Yongqing Jiao, Chuanguang Zhou, Xun Wang

Depression is a common comorbidity in individuals with diabetes and is associated with adverse clinical outcomes. Early identification of high-risk individuals remains challenging due to the multifactorial and nonlinear nature of depression risk. Machine learning (ML) may enhance risk prediction but requires appropriate handling of class imbalance and sufficient interpretability for clinical application. The present study aimed to develop and rigorously evaluate an interpretable, class-imbalance-aware ML model for predicting depression risk among adults with diabetes, using nationally representative data from the National Health and Nutrition Examination Survey (NHANES). We analyzed cross-sectional data from 1140 adults with diabetes in the US (NHANES, 2007–2018). Depression was defined as a Patient Health Questionnaire-9 score ≥10. Predictors included demographic, clinical, lifestyle, and socioeconomic factors. Class imbalance was addressed using combined sampling and cost-sensitive learning. Seven ML models (logistic regression, support vector machine [SVM], random forest, adaptive boosting, decision tree, extreme gradient boosting, and categorical boosting [CatBoost]) were trained and evaluated. Model performance was assessed using the area under the receiver operating characteristic curve, with screening-oriented metrics optimized via threshold tuning. Model interpretability was examined using SHapley Additive exPlanations (SHAP), and clinical utility was evaluated using decision curve analysis. The SVM model demonstrated the most balanced performance after threshold optimization, achieving superior sensitivity and positive-class F1-score for depression detection. Key predictors identified by SHAP included chest pain, poverty-income ratio, sleep duration, sex, body mass index, physical activity, triglyceride levels, and diet quality (Healthy Eating Index–2020). Decision curve analysis indicated favorable net benefit for screening, particularly at lower risk thresholds. An interactive web-based application was developed to provide individualized risk predictions and explanations. An interpretable, imbalance-aware SVM model effectively predicts depression risk among adults with diabetes and supports individualized risk stratification, offering a potential tool for precision screening and early intervention.

More from our Archive