NLP-derived CMR phenotypes and reverse remodeling probability in cardiomyopathy
I Girard Cunha Vieira Lima, J T N Jairo Tavares Nunes, J A B Edimar Alcides BocchiAbstract
Background
Cardiomyopathies display marked heterogeneity in structural remodeling and outcomes, yet risk stratification remains largely centered on left ventricular ejection fraction (LVEF) and categorical late gadolinium enhancement (LGE) patterns. These approaches incompletely capture the multidimensional nature of myocardial disease. Whether quantitative fibrosis burden provides greater insight into myocardial recovery potential than fibrosis topology remains uncertain.
Purpose
To apply natural language processing (NLP) to routine cardiac magnetic resonance (CMR) reports to derive imaging-based phenotypes, generate a continuous probability of theoretical reverse remodeling (RR), and evaluate its prognostic value.
Methods
Consecutive adults with heart failure undergoing clinical CMR were retrospectively identified. Free-text CMR reports were transformed into structured variables using rule-based NLP, extracting ventricular volumes, systolic function, and LGE burden and pattern. Unsupervised k-means clustering of standardized imaging features identified structural phenotypes. A random forest model generated a continuous RR probability. All-cause mortality was assessed using Cox models. Nonlinear associations between CMR parameters, RR probability, and mortality were evaluated using spline models.
Results
Among 2,367 patients, three distinct phenotypes emerged: (1) mildly remodeled ventricles with preserved or mildly reduced function; (2) a severe ischemic phenotype with large biventricular volumes, depressed function, and high transmural LGE burden; and (3) a markedly remodeled non-ischemic phenotype with severe systolic dysfunction but intermediate fibrosis burden. RR probability differed substantially across clusters (mean 0.58, 0.28, and 0.73 for clusters 1–3, respectively; p<0.001). Quantitative LGE burden was strongly associated with RR probability, whereas LGE pattern contributed minimally. In Cox analysis, higher RR probability was strongly associated with lower mortality (hazard ratio per 0.10 increase ≈0.69; p<0.001) and remained independently predictive after adjustment for LVEF, which lost significance. Spline models demonstrated concordant nonlinear relationships between fibrosis burden, ventricular function, RR probability, and mortality risk.
Conclusions
NLP applied to routine CMR reports enables scalable, multidimensional phenotyping of cardiomyopathy and generation of a continuous imaging-derived probability of reverse remodeling. Quantitative fibrosis burden, rather than LGE pattern, is the dominant determinant of myocardial recovery potential. This integrated structural metric provides prognostic information beyond LVEF and may improve CMR-based risk stratification in heart failure.For image description, please refer to the figure legend and surrounding text.