Explainable Machine Learning for Predicting Dengue Recovery Duration: Insights from Multi-Center Clinical Data
Adam Khan, Asad Ali, Fazal Hanan, Muhammad Ismail MohmandBackground: Dengue fever remains a major public health challenge in endemic regions, where recovery duration varies considerably across patients due to a combination of clinical, demographic, and contextual factors. Although machine learning (ML) approaches have increasingly been applied to dengue related prediction tasks, many existing models operate as black boxes, limiting their interpretability and practical usefulness in healthcare settings. This study presents an Explainable Artificial Intelligence (XAI) based machine learning framework for analyzing dengue recovery duration using a multi-center clinical dataset collected from healthcare institutions across Khyber Pakhtunkhwa, Pakistan. Methods: Clinical records from 100 laboratory-confirmed dengue patients treated across multiple healthcare institutions were analyzed. The dataset included demographic, socio-economic, and clinical variables. Four machine learning models: Linear Regression, Decision Tree, Random Forest, and Neural Network, were developed and evaluated using 10-fold cross-validation. Explainability techniques, including Partial Dependence Plots (PDP), Individual Conditional Expectation (ICE), and Local Interpretable Model-Agnostic Explanations (LIME), were employed to investigate global and patient specific factors influencing recovery duration. Results: Among the evaluated models, Random Forest demonstrated the best overall predictive performance, achieving the lowest Root Mean Square Error (RMSE; 11.29 days) and Mean Absolute Error (MAE; 9.09 days), corresponding to a 40.4% reduction in prediction error compared with Linear Regression. Decision Tree also showed substantial improvement, reducing RMSE by 37%, whereas the Neural Network achieved a more modest improvement of 8.6%. Although all models exhibited relatively low coefficient of determination (R2) values (maximum R2 = 0.026), the explainability analyses consistently identified age and platelet count as the most influential predictors of recovery duration. Older age and lower platelet counts were generally associated with longer recovery periods, while hospital type, education level, and blood group also contributed to prediction outcomes. ICE and LIME analyses further revealed considerable patient level heterogeneity, indicating that recovery trajectories are shaped by complex interactions among clinical, demographic, and contextual factors rather than a single dominant predictor.