DOI: 10.1093/ejhf/xuag193.1041 ISSN: 1388-9842

Tracing myocardial contractility: performance of large language models on estimating ejection fraction and right ventricular function based on acute coronary syndrome ECGs

M Rocha, H Moreira, P Palma, A Pinho, E Oliveira, J Goncalves, B Cruz, B Viana, T Branco, E Figueiredo, L Alves, R Rodrigues

Abstract

Introduction/Background

large language models (LLMs) are increasingly available for ECG interpretation, but their ability to estimate ventricular function from acute coronary syndrome (ACS)ECGs is still uncertain.

Aim

to compare ChatGPT and Perplexity for left ventricular ejection fraction (LVEF) and right ventricular function (RVF) estimation from 12-lead ECGs of ACS patients.

Methods

we retrospectively analysed 129 consecutive patients from a Portuguese hospital with emergent cath lab activation for suspected ACS(2024– 2025). Reference LVEF and RVF were obtained from imaging. Each ECG was shown, with a standardised prompt, to ChatGPT and Perplexity, which were asked to provide an ECG diagnosis, LVEF in 5% categories, 10% categories (<30%, 30–39%, 40–49%, ≥50%) and reduced vs non reduced LVEF(<40% vs ≥40%),and to classify RVF(normal, lower limit of normal, impaired). Accuracy vs reference was calculated, and paired LLM comparisons used McNemar’s test. Pre-specified subgroups were the most frequent ECG diagnoses: anterior myocardial infarction (MI) and inferior MI with its variants.

Results

mean age was 63.5±13.8 years; 77% were male. Median LVEF was 45% (IQR 34–55); 38% had LVEF<40%. RVF was normal, borderline and impaired in 79%, 15% and 6%. Overall, ChatGPT vs Perplexity accuracy was 24.0% vs 22.5% for 5% LVEF bins (p=0.85), 34.9% vs 31.0% for 10% categories (p=0.47) and 65.1% vs 57.4% for reduced LVEF(p=0.13). RVF accuracy was higher, with ChatGPT outperforming Perplexity: 82.2% vs 73.6% (p=0.010; paired odds ratio 6.5).For anterior MI, reduced LVEF was correctly identified in 48.9% vs 35.6% and RVF in 97.8% vs 88.9%, numerically favouring ChatGPT(both p>0.20).For inferior MI, ChatGPT was also non significantly superior in LVEF estimation (74.0% vs 68.0%) and RVF(60.0% vs 56.0%).

Conclusions

in our ACS cohort, two accessible LLMs provided only poor to modest LVEF estimation from acute ECG analysis, with accuracy improving as expected when LVEF was simplified to reduced vs non-reduced categories, but without significant differences between models. The RVF estimation had a modest to good performance and ChatGPT showed significantly higher overall RVF accuracy. Hence, these current LLM AIs should not be used as stand-alone tools for initial ventricular function estimation and severity triage in acute ACS, particularly for left ventricular ejection fraction

More from our Archive