Hypertension Detection Using Explainable Stacked Ensemble Machine Learning From Clinical and Physiological Data: A Comprehensive Study
Shah Muhammad Azmat Ullah, A. B. M. Aowlad Hossain, Md. Ebtidaul KarimABSTRACT
Background and Aims
Hypertension or high blood pressure is a life threatening common cardiovascular disease (CVD) all over the world. In the era of information technology, communication, and artificial intelligence, early prediction of hypertension using various techniques can be advantageous to alert patients. The aim of this research is to automatically detect hypertension using an ensemble of different machine learning models. It carefully studies the use of both clinical and physiological data and compares the performance and explainability of the proposed model with other existing models and published works.
Methods
This research study proposed a stacked ensemble learning‐based model to detect hypertension. The popular classification models K‐Nearest Neighbor (KNN), Random Forest (RF), and Light Gradient Boosting Machine (LGBM) are used as a stacked classifier, and at last, a Support Vector Machine (SVM) classifier is used as a meta‐classifier. A publicly accessible large dataset of 21,613 patients containing the clinical and physiological data related to hypertension is used in this study. Additionally, data diversity is considered to test the generalization capability of the proposed learning model. Three datasets having only clinical data of both male and female subjects and their combination are used to train and evaluate the proposed model with an emphasis on enhancing the generalization capabilities of the classifier. To resolve data distribution imbalances, the proposed framework employs the Synthetic Minority Oversampling Technique Tomek Link (SMOTE‐Tomek), and various feature selection techniques are utilized to compare the impact of features on this model. Various performance evaluation matrices are used to assess and analyze the performance of the classifier under different dataset cases. Moreover, the explainability of the proposed model is inspected using SHapley Additive exPlanations (SHAP) values, and it is perceived that the feature importance given by the model is sensible.
Results
The obtained results show that the proposed model achieves the superior accuracy when compared to alternative models and past research investigations. The proposed stacked ensemble model can detect hypertension from clinical and physiological data with the accuracy of 85.90%, 86.72%, and 85.91% for feature sets having feature numbers 21, 10, and 8, respectively. Second, for only clinical data, our model achieves 89.58%, 58.54%, 84.31%, 79.84%, and 77.78% for datasets I, II, III, IV, and I + II, respectively. Comparisons among different combinations of feature sets and with other single and ensemble models are analyzed to achieve the highest accuracy of the proposed model.
Conclusion
The outcomes of this research can be useful in the realm of healthcare and predictive analytics of hypertension. By emphasizing timely detection, the research underscores the model's potential in reducing individual health risks and enabling proactive intervention, thus highlighting the significant role of AI technology‐driven solutions revolutionizing healthcare practices.