Predicting Chronic Kidney Disease from Biomarkers: An Explainable Machine Learning Approach
Abass Al-Momany, Omar Almomani, Ensaf Y. AlmomaniBackground/Objectives: Chronic kidney disease (CKD) remains underdiagnosed until advanced stages, motivating reliable, clinically deployable screening models that pair high discrimination with an explicit operating threshold and transparent explanations. Methods: In this study, we propose a CKD detection framework that integrates structured preprocessing, class imbalance handling, stratified 10-fold cross-validation with out-of-fold (OOF) prediction, and clinically oriented threshold selection via the Youden index, followed by explainability using SHAP and LIME. Experiments were conducted on two datasets. Across a broad panel of ten machine learning models, gradient boosting methods consistently dominated. Results: LightGBM achieved the best overall clinical composite performance on both datasets. On Dataset 1, LightGBM delivered near-ceiling OOF discrimination (ROC-AUC = 99.98, PR-AUC = 99.98) and an excellent clinically balanced performance at the best Youden threshold (0.41), reaching sensitivity = 99.20, specificity = 99.60, accuracy = 99.40, F1 = 99.40, and MCC = 98.80, with robust cross-validation stability (CV AUC = 99.99 ± 0.04; CV sensitivity = 99.10 ± 1.81; CV specificity = 99.46 ± 1.42; CV MCC = 98.59 ± 2.19), strong calibration (Brier = 0.006), and fast training (0.078 ± 0.019 s/fold). On Dataset 2, LightGBM maintained high generalization (ROC-AUC = 99.72, PR-AUC = 99.64) and clinically deployable balance at the best Youden threshold (0.35), achieving sensitivity = 98.10, specificity = 98.03, accuracy = 98.06, F1 = 98.06, and MCC = 96.13, with consistent fold-wise performance (CV AUC = 99.69 ± 0.25; CV sensitivity = 97.25 ± 1.25; CV specificity = 98.11 ± 1.02; CV MCC = 95.37 ± 1.56), acceptable calibration (Brier = 0.0173), and practical training time (0.742 ± 0.144 s/fold). Conclusions: Finally, SHAP and LIME explanations confirmed that model decisions align with clinically meaningful renal function and symptom/biomarker patterns at both population and patient levels, supporting safer translation of the proposed framework into CKD screening and decision-support workflows.