DOI: 10.1177/20552076261464747 ISSN: 2055-2076

Machine learning-based prediction of dataset-defined myocardial infarction risk: A retrospective computational study for precision cardiovascular risk assessment

Mohammad Subhi Al-Batah, Abdullah Alourani

Objective

This study aimed to develop and evaluate machine learning and deep learning models for the early prediction of dataset-defined myocardial infarction/heart disease risk using structured demographic, clinical, electrocardiographic, exercise-testing, and angiographic-related variables. The study also aimed to improve the clinical interpretability and methodological transparency of artificial intelligence-based cardiovascular risk prediction.

Methods

A retrospective computational modeling design was applied to a merged public heart disease dataset containing 1,888 records and 14 variables. The outcome was the original binary dataset label indicating higher versus lower likelihood of heart disease/heart attack risk. Data preprocessing included missing-value management, categorical encoding, normalization of continuous variables, and stratified model evaluation. Three models were compared: Random Forest, Support Vector Machine, and a multilayer perceptron deep neural network. Model performance was assessed using accuracy, precision/positive predictive value, recall/sensitivity, F1-score, ROC-AUC, and class-wise support. A separate statistical analysis section was added to report descriptive analysis, outcome distribution, feature correlation, and clinically relevant evaluation metrics.

Results

The Random Forest model achieved the strongest overall performance, with 97.0% accuracy, 96.0%-97.0% class-wise precision, 96.0%-97.0% class-wise recall, 97.0% F1-score, and ROC-AUC of 0.97. The deep neural network achieved 92.3% accuracy and ROC-AUC of 0.94, whereas the Support Vector Machine achieved 87.0% accuracy and ROC-AUC of 0.87. The balanced distribution of the dataset supported internal model training but may not reflect the lower prevalence of myocardial infarction in real-world healthcare settings.

Conclusion

The findings suggest that Random Forest can provide strong internal predictive performance for structured cardiovascular risk classification. However, the outcome represents a dataset-defined surrogate classification rather than independently adjudicated acute myocardial infarction. Therefore, external validation, calibration, comparison with validated clinical risk scores, and prospective evaluation in real-world clinical populations are required before clinical deployment.

More from our Archive