DOI: 10.1093/ejhf/xuag193.1432 ISSN: 1388-9842

ICD-10-based machine learning classification model of left ventricular systolic dysfunction: a population-based risk stratification evaluation in 83,357 patients with acute coronary syndrome

A Cosa, M Llagostera, S Jovells-Vaque, E Vela, J Folguera-Profitos, D Monterde, J Piera-Jimenez, G Carot-Sans, R Ramos-Polo, O Merono, A Ricarte Marin, N Lopez Fernandez, M Andres-Villarreal, C Enjuanes-Grau, J Comin-Colet

Abstract

Background

A key prognostic measure in patients after Acute Coronary Syndrome (ACS) is left ventricular ejection fraction (LVEF) assessed before hospital discharge. Administrative datasets based on diagnostic codes offer multiple advantages for population-based studies; however, they are limited by the lack of structured clinical information—particularly LVEF—which is essential for risk stratification in ACS patients. Therefore, there is an unmet need to develop a model capable of classifying patients according to LVEF using ICD-10 diagnostic codes, enabling large-scale population-based analyses.

Purpose

Firstly, we aimed to develop a machine learning (ML) model using diagnostic codes to estimate left ventricular systolic dysfunction (defined as LVEF <50%) after ACS. Secondly, to evaluate its risk-stratification capability in a population-based cohort.

Methods

We used a retrospective cohort of 2,228 ACS patients (2009–2010) from a single centre, linking pre-discharge echocardiography with ICD-10 codes. Standard data preprocessing, feature selection, model training and internal validation were applied, including cross-validation. Performance was evaluated through accuracy, precision and ROC-AUC, and interpretability was explored with SHAP (Shapley Additive Explanations).

To explore its risk-stratification potential at population scale, the final model was applied to 83,357 patients discharged alive after ACS in a population-based registry cohort (2011–2021). As LVEF was unavailable in this cohort, the model was used to classify patients and describe clinical profiles, not for external performance validation.

Results

A logistic regression model was developed and evaluated with stratified 10-fold cross-validation, showing moderate discrimination for LVEF <50% (average accuracy 0.764; ROC-AUC 0.738; precision 0.713; recall 0.295). SHAP highlighted diagnostic code I2109 (Acute ST-elevation MI with involvement of another anterior coronary artery) and Killip-Kimbal classification as key predictors (Figure 1).

When applied to a cohort of a multicentre study, patients classified with LVEF <50% had a worse clinical profile. Incident heart failure was 2.5 times higher as well as the mortality risk (HR: 2.45; 95% CI: 2.35-2.55; p <0.0001). (Figure 2)

Conclusions

This preliminary study demonstrates that LVEF can be moderately predicted from ICD-10 diagnostic codes using a logistic regression–based ML model. These findings confirm the potential of routinely collected administrative data to support large-scale risk stratification in ACS and to inform clinical decision-making.Top SHAP Predictors: I21.09 & KillipFor image description, please refer to the figure legend and surrounding text.Kaplan–Meier Curves for Main EventsFor image description, please refer to the figure legend and surrounding text.

More from our Archive