Predicting ICD therapy events through machine learning: bridging clinical predictors and device data
T Chiba, S Wegner, R Haettasch, V Tscholl, N Dagres, W Haverkamp, G Hindricks, F HohendannerAbstract
Background
Implantable cardioverter-defibrillators (ICDs) play a key role in preventing sudden cardiac death by delivering antitachycardia pacing (ATP) or shock therapies. However, a substantial proportion of ICD recipients never experience appropriate therapy, underscoring the need for risk stratification to guide follow-up intensity and device programming. This study aimed to develop and validate a machine learning model to predict the occurrence of any ICD therapy, defined as ATP or shock, in a contemporary cohort of ICD recipients.
Methods
A retrospective analysis was performed on 516 consecutive patients who underwent ICD implantation between 2020 and 2024. Mean follow-up was 2.7 ± 1.6 years. The mean number of delivered ATP episodes was 1.9 ± 15 and shocks 0.7 ± 5. The binary endpoint was the occurrence of any ICD therapy during follow-up. Data were split into training and test sets (80:20) with stratification by outcome. A Random Forest classifier was trained using a standardized preprocessing pipeline including median imputation for continuous, mode imputation and one-hot encoding for categorical variables, and automated date conversion. Class imbalance was addressed by class-weighted sampling. Model performance was evaluated on the held-out test set using receiver operating characteristic (ROC) and precision–recall curves, calibration analysis, and confusion matrices at default (0.5), Youden-optimal, and F1-optimal thresholds. Feature importance was determined based on Gini impurity reduction.
Results
The model achieved excellent discriminatory performance with an area under the ROC curve of 0.89 and a precision–recall AUC of 0.72. At the Youden-optimal threshold (J = 0.753), sensitivity was 0.75, specificity 0.88, and balanced accuracy 0.82. Calibration analysis demonstrated good agreement in the higher predicted probability range with mild underestimation at lower risk levels. The top-decile lift exceeded four times the baseline event rate, indicating strong enrichment of high-risk individuals. The most predictive features were non-sustained ventricular tachycardia, body mass index, RV defibrillation impedance, and left ventricular ejection fraction. These variables collectively accounted for more than 70% of the total model importance. Secondary contributors included atrial fibrillation, heart failure with reduced ejection fraction, hypertension, diabetes, and the use of class III antiarrhythmic agents.
Conclusion
A machine learning model based on routinely available clinical and device-derived parameters predicted subsequent ICD therapy events in a contemporary patient cohort. The model demonstrated strong discrimination, adequate calibration, and clinically interpretable feature importance consistent with established electrophysiologic mechanisms. These findings support further investigating AI-based prediction for routine ICD follow-up to enable individualized risk assessment in patients at risk for ventricular arrhythmias.ROCTable