DOI: 10.3390/healthcare14131869 ISSN: 2227-9032

Machine Learning Prediction of Clostridioides difficile Infection in Hospitalized COVID-19 Patients Across Pandemic Waves

Oliver Lohaj, Pavel Kočan, Anna Biceková, Daniela Javorská

Background/Objectives: Clostridioides difficile infection (CDI) represents an important healthcare-associated complication in hospitalized patients, particularly in those exposed to antibiotics, prolonged hospitalization, and intensive treatment during COVID-19. This study aimed to design, evaluate, and interpret machine learning models for predicting CDI occurrence in hospitalized COVID-19 patients across individual pandemic waves, with respect to administered treatment and clinical characteristics. Methods: Anonymized clinical data from 3848 COVID-19-positive patients treated at the University Hospital of L. Pasteur in Košice, Slovakia, were analyzed following the CRISP-DM methodology. Four classification models were compared: logistic regression, Random Forest, XGBoost, and a multilayer perceptron. Missing values were addressed using MICE and KNN imputation, and class imbalance was handled through oversampling techniques. Given the low CDI prevalence of 2.68%, model performance was primarily assessed using the precision–recall area under the curve (PR-AUC), with AUROC reported for comparability. Interpretability was supported using SHAP, LIME, and odds ratio analysis. Results: The best-performing models achieved PR-AUC values up to 0.160, representing more than a fivefold improvement over the random baseline of 0.027. XGBoost reached the highest AUROC of 0.823, followed by Random Forest with 0.798. Inflammatory markers were identified as important predictors of CDI risk. A Flask-based decision-support web application was developed to provide CDI risk estimation with patient-specific explanations. A preliminary pilot usability evaluation involving two physicians yielded a mean System Usability Scale score of 73.75; however, the very small evaluator sample limits the generalizability of this finding. Conclusions: Interpretable machine learning models can support clinically meaningful CDI risk stratification in highly imbalanced COVID-19 hospital datasets. The proposed decision-support tool shows potential for future integration into clinical workflows, although external and prospective validation is required.

More from our Archive