Explainable AI and Ensemble Machine Learning Analysis of River Flow Dynamics: Influence of Key Climatic Variables (Temperature, Humidity, Precipitation)
Mustafa Çakır, Gizem Nazlı Ural, Mükerrem Oral, Okan Oral, Mesut YılmazAbstract
Accurate short-term river flowrate forecasting is essential for flood risk mitigation and sustainable water management. However, many machine learning (ML) applications in hydrology lack strict temporal validation and interpretability, limiting operational reliability. This study develops a reproducible and explainable framework for one-day-ahead daily flowrate forecasting in the Eşen River Basin (Türkiye) using hydro-meteorological data (2017-2022) from one flowrate station and four meteorological monitoring stations.The workflow integrates the Box-Cox transformation, lag-based feature engineering (up to 3 days, reflecting short-term hydrological memory), Boruta feature selection, and strictly time-aware rolling validation (2017-2021 training; 2022 independent test). Classical time-series models (ARIMA, TBATS), interpretable baselines (DT, LR), and advanced ML algorithms (RF, GBM, XGBoost, SVM, ANN) were benchmarked using RMSE, MAE, R², NSE, and KGE. Ensemble tree-based models consistently outperformed classical and baseline approaches in magnitude-sensitive metrics. XGBoost achieved the highest predictive accuracy (R² = 0.864; NSE = 0.864; RMSE = 6590 dm³ s⁻¹). Although TBATS yielded the highest KGE (0.865), ensemble models better captured nonlinear dynamics and flowrate variability. SHAP and LIME analyses revealed that short-term flow lags dominate predictive structure, while precipitation and temperature exert regime-dependent influence. The complete workflow is openly deployed via a reproducible R - Shiny environment. The results demonstrate that explainable ensemble learning, combined with strict temporal validation, provides a reliable and transparent framework for operational hydrological forecasting.