DOI: 10.1111/dme.70398 ISSN: 0742-3071

Comparative predictive accuracy of machine learning versus traditional statistical methods for diabetes‐related complications: A systematic review and analysis

Bismah Ghafoor, Francesco Zaccardi, Kamlesh Khunti, Sharmin Shabnam

Abstract

Aims

To compare the predictive accuracy of machine learning models versus traditional statistical models for predicting and detecting long‐term complications among individuals with diabetes (PROSPERO: CRD420250629747).

Methods

We systematically searched MEDLINE, PubMed, Cochrane and Scopus (2014–2025) for studies developing or validating prediction models in people with diabetes. Excluding case–control studies, we identified 36 eligible studies (280 model comparisons) from 18,237 records. We extracted study design, model details and performance metrics (primarily C‐statistics). Risk of bias was assessed using PROBAST.

Results

Across 280 comparisons, ensemble machine learning methods frequently outperformed logistic regression. Random forest models achieved higher discrimination in 63% (43/68) of comparisons, while extreme gradient boosting showed improvement in 58% (14/24). Support vector machines improved performance in only 44% (24/55). Generally, predictive accuracy gains were modest. Methodological quality was concerning as external validation was reported in only 8% (3/36) of studies, calibration in 13% (5/36), and 59% of studies demonstrated a high risk of bias.

Conclusions

Machine learning models, particularly ensemble methods, offer modest discrimination improvements over traditional statistics for predicting diabetes‐related complications. However, widespread methodological limitations, specifically the lack of external validation, inconsistent calibration reporting and high bias, substantially limit our confidence and clinical readiness. Rigorous external validation and transparent reporting are needed before routine implementation.

More from our Archive