Predictors of death in diabetes-associated Enterobacteriaceae sepsis: A Mendelian randomization and machine learning study
Hengtong Lou, Kaiyu Han, Yunpeng Wang
Diabetes mellitus is a common comorbidity that may increase susceptibility to severe infections and worse outcomes. We aimed to identify clinical predictors of in-hospital mortality in diabetic patients with Gram-negative Enterobacteriaceae sepsis using a data-driven feature-selection and machine learning (ML) approach, and to examine the causal relationship between diabetes and sepsis using Mendelian randomization (MR). We performed a retrospective cohort study of patients admitted to The Second Affiliated Hospital of Harbin Medical University between January 2018 and August 2025 with diabetes and Gram-negative Enterobacteriaceae sepsis. The Boruta algorithm was used to identify variables important for mortality prediction; selected features were entered into ML classifiers and evaluated using receiver operating characteristic analysis. Separately, MR analysis was conducted with diabetes as the exposure and sepsis as the outcome. The inverse-variance weighted method was used as the primary estimator, and MR-Egger, weighted median, simple mode, and weighted mode approaches were applied as sensitivity analyses to assess robustness and potential directional pleiotropy. Boruta selection retained 10 variables associated with in-hospital mortality: intensive care unit admission, concomitant shock, respiratory failure, liver abscess, coma, anemia, decreased platelet count, reduced red blood cell count, infection with extended-spectrum β-lactamase-producing organisms, and carbapenem-resistant Enterobacteriaceae infection. Among ML models, a neural network classifier achieved the highest discriminative performance on the validation set (area under the curve = 0.957; 95% confidence interval: 0.905–1.000). MR indicated a modest but statistically significant association of genetically predicted diabetes with risk of sepsis (inverse-variance weighted