A Comparative Analysis of Advanced Machine Learning Techniques for River Streamflow Time-Series Forecasting
Antoifi Abdoulhalik, Ashraf A. AhmedThis study examines the contribution of rainfall data (RF) in improving the streamflow-forecasting accuracy of advanced machine learning (ML) models in the Syr Darya River Basin. Different sets of scenarios included rainfall data from different weather stations located in various geographical locations with respect to the flow monitoring station. Long short-term memory (LSTM)-based models were used to examine the contribution of rainfall data on streamflow-forecasting performance by investigating five scenarios whereby RF data from different weather stations were incorporated depending on their geographical positions. Specifically, the All-RF scenario included all rainfall data collected at 11 stations; Upstream-RF (Up-RF) and Downstream-RF (Down-RF) included only the rainfall data measured upstream and downstream of the streamflow-measuring station; Pearson-RF (P-RF) only included the rainfall data exhibiting the highest level of correlation with the streamflow data, and the Flow-only (FO) scenario included streamflow data. The evaluation metrics used to quantitively assess the performance of the models included the RMSE, MAE, and the coefficient of determination, R2. Both ML models performed best in the FO scenario, which shows that the diversity of input features (hydrological and meteorological data) did not improve the predictive accuracy regardless of the positions of the weather stations. The results show that the P-RF scenarios yielded better prediction accuracy compared to all the other scenarios including rainfall data, which suggests that only rainfall data upstream of the flow monitoring station tend to make a positive contribution to the model’s forecasting performance. The findings evidence the suitability of simple monolayer LSTM-based networks with only streamflow data as input features for high-performance and budget-wise river flow forecast applications while minimizing data processing time.