Real-World Pharmacotherapy-Driven Cardiovascular Risk Prediction Using Interpretable Machine Learning and Jordanian EHR Data
Said Moshawih, Lobna Gharaibeh, Islam Alfreahat, Abeer Jabra ShnoudehBackground: Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, with over 75% of deaths occurring in low- and middle-income countries, where conventional risk models often demonstrate poor calibration and limited generalizability. Objective: This study aimed to develop an interpretable, pharmacotherapy-informed machine learning model for cardiovascular risk prediction using national electronic health record (EHR) data from Jordan. Methods: A retrospective cohort study was conducted using approximately 600,000 individuals from the national Hakeem EHR system (2018–2022). Demographic, clinical, blood pressure, laboratory, and medication data were integrated to construct three datasets reflecting varying levels of feature completeness. Multiple machine learning models were benchmarked, followed by optimization, hybrid modeling, and probability calibration. Model interpretability was assessed using SHAP analysis. Results: The national cohort demonstrated a high cardiometabolic burden, with prevalence of hypertension (50.2%), hyperlipidemia (54.9%), and diabetes (47.9%). Antihypertensive and lipid-lowering therapies were more frequently used among CVD patients (56.9% and 49.6%, respectively). Treatment patterns were dominated by amlodipine (19.9%) and atorvastatin (74.4%). The final calibrated seed-bagged gradient boosting model achieved robust performance (ROC-AUC 0.844; PR-AUC 0.813) with consistent generalization across datasets. Key predictors included antihyperlipidemic therapy, systolic blood pressure variability, age, and sex. Conclusions: This study presents JoRisk, a calibrated and interpretable machine learning framework that integrates pharmacotherapy and clinical data for short-term cardiovascular risk prediction. The model demonstrates strong performance using routinely available EHR variables and offers a scalable decision-support tool for risk stratification in resource-constrained healthcare systems.