Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks
Victor Galán, Rafael Navas, Sergio ZubelzuAccurate streamflow prediction in small catchments remains challenging due to their rapid response times, threshold-driven behaviors, and high spatial heterogeneity. This study develops and evaluates a novel modeling approach combining physics-informed feature selection with machine learning algorithms. Overall, 1825 model configurations were tested across fifteen algorithms (including Random Forest, XGBoost, LightGBM, CatBoost, Support Vector Machines, and deep learning methods) using multiple physics-informed input structures based on classical rainfall–runoff theory and mass balance conservation. Models were evaluated for predicting minimum, average, and maximum daily water levels and discharge. Results demonstrate that models structured around Green-Ampt infiltration assumptions consistently outperformed alternative configurations, with Random Forest achieving good performance for water level predictions. Causal models outperformed autoregressive approaches while the residuals analysis showed limitations in predicting extreme values. Feature importance analysis revealed that channel and catchment morphology and initial soil moisture conditions were dominant predictors, aligning with hydrological process understanding.