Development and Validation of an
XGBoost
‐
SHAP
Model for Predicting Adverse Outcomes in Elderly Cardiovascular Patients With Polypharmacy: A Retros
Kun Wang, Dandan Cao ABSTRACT
This study aims to develop and validate an interpretable machine learning model using Extreme Gradient Boosting (XGBoost) with SHapley Additive exPlanations (SHAP) analysis to predict adverse outcomes in elderly cardiovascular patients with polypharmacy. This retrospective cohort study included 1200 patients aged ≥ 65 years with cardiovascular disease and polypharmacy (≥ 5 medications) from The First Affiliated Hospital of Soochow University between January 2021 and December 2024. Data were split into training (60%), validation (20%), and test (20%) sets. The primary outcome was adverse drug events (ADEs); secondary outcomes included 30‐day readmission, 90‐day readmission, and 1‐year mortality. XGBoost models were developed and compared with logistic regression, random forest, and support vector machine algorithms. Model interpretability was enhanced using SHAP values to identify key predictive features and their contributions. The cohort had a mean age of 76.34 ± 8.12 years, with 18.92% experiencing ADEs during follow‐up. The XGBoost model demonstrated strong discriminative ability for ADE prediction (AUC 0.857, 95% CI: 0.790–0.910) compared to logistic regression (AUC 0.795, p < 0.001). The model also performed well for secondary outcomes: 30‐day readmission (AUC 0.843, 95% CI: 0.779–0.899), 90‐day readmission (AUC 0.848, 95% CI: 0.784–0.895), and 1‐year mortality (AUC 0.873, 95% CI: 0.807–0.929), all significantly superior to logistic regression (all p < 0.001). At optimal thresholds, the model achieved sensitivity ranging from 62.71% to 90.24% and specificity from 67.84% to 87.85% across outcomes. SHAP analysis identified age (importance: 0.196), estimated glomerular filtration rate (0.181), medication count (0.159), serum albumin (0.147), and Charlson Comorbidity Index (0.138) as the most influential predictors. The model maintained good calibration (Brier scores: 0.156–0.164) and consistent performance across patient subgroups. SHAP threshold analysis identified eGFR < 45 mL/min/1.73 m 2 and serum albumin < 3.5 g/dL as key risk boundaries associated with markedly elevated adverse drug event risk. The XGBoost‐SHAP model provides accurate, interpretable prediction of adverse outcomes in elderly cardiovascular patients with polypharmacy. This approach may facilitate personalized risk assessment and targeted interventions to improve medication safety in this vulnerable population.