DOI: 10.1093/ejhf/xuag193.906 ISSN: 1388-9842

Next-generation machine learning based risk assessment model , integrating omics, clinical and patient centred data to prevent recurrent heart failure events

A Cosa, S Jovells-Vaque, J Perera-Bel, P Berenguer-Molins, N Jose-Bazan, R Ramos-Polo, C Enjuanes Grau, M Tajes, J Francesch Manzano, S Darnes Soler, X Lin, E Barragan Cabello, L Reverter Ortega, J Comin-Colet

Abstract

Background

Despite advances in heart failure (HF) management, mortality, hospital readmissions and healthcare-related costs remain markedly high. Although several integrated transitional care models have been developed, most of these focus on mortality prediction and none of these has specifically assessed to predict HF-related hospitalizations. There is a growing need for multimodal predictive approaches that better capture the biological and clinical complexity of HF.

Purpose

The ORACLE multicentric study aimed to develop a machine learning (ML) model integrating omics, clinical and psychosocial data to improve HF risk stratification and prediction of HF-related hospitalizations.

Methods

RNA-sequencing was performed in 60 patients (30 cases, 30 controls) to identify candidate blood biomarkers associated with HF. Differential expression analysis and GLM models (Lasso, Ridge) selected 22 genes. Their expression was then quantified by qPCR in an independent cohort of 119 patients. These gene-expression values were integrated with clinical and psychosocial variables and used as predictors in supervised learning models.

Multiple algorithms (including logistic regression, random forest and CatBoost) were trained on 80% of the dataset using cross-validation and hyperparameter tuning. Internal performance was assessed on the remaining 20% using AUC and balanced accuracy (BA). Feature importance was examined using model-intrinsic metrics.

Results

RNA-seq identified 821 differentially expressed genes, from which 22 showed significant associations with HF and were validated by qPCR (Picture 1). In the integrated prediction model including genomic, clinical and patient-centred data, the CatBoost model achieved the best performance, with a mean AUC of ~0.75 and BA of ~0.70 on the internal test set (Picture 2).

Conclusions

Our findings suggest that these 22 genes represent potential novel blood-based biomarkers of HF, and that integrating them with patient centred data in a ML model may enhance risk stratification and prediction of HF-related hospitalizations, more accurately than current clinical methods.Heatmap of 22 selected biomarker genesFor image description, please refer to the figure legend and surrounding text.ROC curves from 5-fold CV using CatBoostFor image description, please refer to the figure legend and surrounding text.

More from our Archive