DOI: 10.1093/ejhf/xuag193.1433 ISSN: 1388-9842

Deep learning in primary care health records to predict incident heart failure from temporal and contextual representation of antecedent risk factors

S Mahapatra, C H R I S Hayward, R A M E S H Nadarajah, D E R E K Magee

Abstract

Background

Both non-diagnosis and delayed diagnosis of heart failure (HF) complicates clinical interventions and adds to the burden of hospitalisation. In recent years, community-based electronic health record (EHR) data is being increasingly used to design predictive models to estimate the individual-level risk of developing incident heart failure. Current HF-prediction models rely on linked secondary care data in addition to primary care data and have insufficient evidence about considering temporality in EHR. A model developed on primary-care EHR data capturing temporal dependencies would not only aid early clinical diagnosis but also have wide applicability in routine clinical practice.

Methods

We studied a cohort of individuals registered in UK primary care (Jan 2, 1998 to Feb 28, 2022)aged between 0-100 years randomly split for five-fold cross validation (20% used for testing). We oversampled cases and controls in a 1:2 ratio to minimise the risk of imbalance between the number of health records per patient in cases and controls. Case-control matching was done on age and sex. We created a contextual representation (contextual embedding) from the medical code descriptions for each patient record using Clinical BERT, a pre-trained deep learning model. Thereafter, we used a transformer model with classification output(classifier), a deep learning algorithm that processes sequential data and weighs data points using attention mechanism to discriminate between classes. Model training was conducted using 5-fold cross-validation. We limited candidate variables to contextual embeddings, age, sex, relative time between visit date and HF-diagnosis date (or surrogate HF-date for controls) and the relative position of a particular visit in the chronological sequence of visits. The optimal transformer model using contextual embeddings had 1 attention head, 1 encoder layer and used Adam optimizer with a learning rate of 10^-4. Performance metrics were recorded to evaluate model prediction.

Results

Of 8415495 records of 15296 UK individuals (median age =43.4 years, 45.9% women), patients without incident HF were older (51.2 vs 47.5years, p<0.001). However, patients with incident HF had a higher prevalence of each of the risk-factors (atrial fibrillation: 36.6% vs 8.3%, p<0.001; hypertension:52.9% vs 32.5%, p<0.001; diabetes:24.7% vs 10.5%, p<0.001; myocardial infarction: 20.3% vs 3.7%, p<0.001%; dyslipidemia: 80% vs 62.35, p<0.001 being the top five).

The AUROC, sensitivity, precision and F1-score of the best model were 0.76, 0.7 , 0.71, 0.705 respectively.

Conclusion

Our findings suggest that a transformer model incorporating temporal dependencies and contextual representations can give superior prediction of incident HF in nationally representative datasets with just primary care data. This highlights the potential of reducing the reliance on secondary care data for screening incident-HF. External and prospective validation is now required .For image description, please refer to the figure legend and surrounding text.For image description, please refer to the figure legend and surrounding text.

More from our Archive