Early prediction of prolonged length of stay in heart failure admissions using machine learning models
P Darko, B Otchere, P Berchie, E Hama, E Vince, X Salazar, B Demoss, A Krishnamoorthy, R Singh, R Lougani, K Bauza, S Damle, E Molina, C MartiAbstract
Background
Prolonged hospital length of stay (LOS) is a major driver of inpatient bed utilization and healthcare costs in cardiology admissions. Heart failure accounts for a substantial proportion of cardiovascular hospitalizations, yet early identification of patients at risk for prolonged stay remains difficult. Most existing prediction approaches rely on data accrued later in the hospital course, limiting operational value.
Purpose
To develop and evaluate early machine learning models to predict prolonged hospital LOS using routinely available admission data in patients hospitalized with heart failure.
Methods
We conducted a retrospective cohort study using the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Adult index hospital admissions with a primary diagnosis of heart failure were included. Prolonged LOS was defined as hospitalization exceeding the 75th percentile of the cohort distribution. Predictors were limited to variables available within the first 24 hours of admission, including demographics, insurance status, comorbidities, diagnostic category, and early medication use. The cohort was split into training (80%) and testing (20%) sets using stratified sampling. Multivariable logistic regression, lasso regression, and random forest models were developed. Model performance was assessed using area under the receiver operating characteristic curve (AUC) and calibration.
Results
Among 80,590 heart failure admissions, mean age was 72.8 ± 13.6 years, 45.9% were female, and mean LOS was 7.05 ± 8.9 days; the 75th percentile threshold for prolonged stay was 8.5 days. All models demonstrated modest discrimination for prolonged LOS, with AUCs of 0.611 for logistic regression, 0.610 for lasso regression, and 0.626 for the random forest model. Logistic regression showed good calibration across risk deciles. Increasing age was associated with lower odds of prolonged hospitalization (odds ratio 0.989 per year; 95% CI 0.988–0.991; p<0.001). Insurance status was not independently associated with prolonged LOS, while race and ethnicity showed significant associations, including higher odds among patients with missing race data. In random forest analysis, age was the most influential predictor, followed by race and insurance status.
Conclusion
Early machine learning models using routinely available admission data can modestly predict prolonged LOS in heart failure admissions, with good calibration and minimal incremental benefit from more complex models. Sociodemographic factors, particularly age and race, were dominant predictors, suggesting that prolonged hospitalization reflects system- and care-process factors in addition to clinical severity. Early risk stratification may aid operational planning and discharge coordination in cardiology services.For image description, please refer to the figure legend and surrounding text.