Evolutionary-Assisted Data-Driven Approach for Dissolved Oxygen Modeling: A Case Study in Kosovo
Bruno da S. Macêdo, Larissa Lima, Douglas Lima Fonseca, Tales H. A. Boratto, Camila M. Saporetti, Osman Fetoshi, Edmond Hajrizi, Pajtim Bytyçi, Uilson R. V. Aires, Roland Yonaba, Priscila Capriles, Leonardo GoliattDissolved oxygen (DO) is widely recognized as a fundamental parameter in assessing water quality, given its critical role in supporting aquatic ecosystems. Accurate estimation of DO levels is crucial for effective management of riverine environments, especially in anthropogenically stressed regions. In this study, a hybrid machine learning (ML) framework is introduced to predict DO concentrations, where optimization is performed through Genetic Algorithm Search with Cross-Validation (GASearchCV). The methodology was applied to a dataset collected from the Sitnica River in Kosovo, comprising more than 18,000 observations of temperature, conductivity, pH, and dissolved oxygen. The ML models Elastic Net (EN), Support Vector Regression (SVR), and Light Gradient Boosting Machine (LGBM) were fine-tuned using cross-validation and assessed using five performance metrics: coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), mean absolute relative error MARE, and mean square error (MSE). Among them, the LGBM model yielded the best predictive results, achieving an R2 of 0.944 and RMSE of 8.430 mg/L on average. A Monte Carlo Simulation-based uncertainty analysis further confirmed the model’s robustness, enabling comparison of the trade-off between uncertainty and predictive precision. Comparison with recent studies confirms the proposed framework’s competitive performance, demonstrating the effectiveness of automated tuning and ensemble learning in achieving reliable and real-time water quality forecasting. The methodology offers a scalable and reliable solution for advancing data-driven water quality forecasting, with direct applicability to real-time environmental monitoring and sustainable resource management.