DOI: 10.3390/diagnostics16132031 ISSN: 2075-4418

Comparative Evaluation of Machine Learning Models and Conventional Formulas for LDL Cholesterol Estimation

Bagnu Orhan, Levent Deniz, Cengiz Aydin, Ipek Deveci Kocakoc

Background/Objectives: This study aimed to develop and externally validate machine learning (ML) models for low-density lipoprotein cholesterol (LDL-C) estimation and compare their analytical and clinical performance with conventional formulas, particularly in individuals with elevated triglyceride (TG) levels. Methods: This retrospective study included 11,681 adults whose lipid profiles were retrieved using a laboratory information system. ML models (linear regression, random forest, support vector regression, and XGBoost) were developed using routine lipid parameters and evaluated using 10-fold cross-validation. Performance was assessed using the mean absolute error (MAE), root mean squared error (RMSE), bias, correlation, Bland–Altman agreement, and clinical classification according to LDL-C categories. Subgroup analyses were conducted across TG strata, with an emphasis on TG ≥ 400 mg/dL. Results: ML models generally demonstrated lower error and higher agreement with directly measured LDL-C levels than conventional formulas. XGBoost showed the best overall performance (MAE: 14.7 mg/dL; RMSE: 20.22 mg/dL; R2 = 0.780; r = 0.88) and the lowest deviation. The ML models also showed a higher clinical classification accuracy (up to 66%). Performance declined with increasing TG levels, particularly for conventional formulas, whereas ML models remained more stable, including patients with TG ≥ 400 mg/dL. External validation across independent cohorts and analytical platforms demonstrated stable performance of the XGBoost model and generally higher classification accuracy than conventional LDL-C estimation formulas. Conclusions: ML-based LDL-C estimation may represent a complementary alternative to conventional formulas, particularly in hypertriglyceridemic populations.

More from our Archive