Prediction of microbiological non-compliances using a Boosted Regression Trees model: application on the drinking water distribution system of a whole country
Mariana Barcia, Alexandra Sixto, Maria Pia Cerdeiras- Water Science and Technology
Abstract
Universal access to safe drinking water is a fundamental human right and a requirement for a healthy life. Therefore, monitoring the quality of the supplied water is of utmost importance. To achieve this goal, there is a need to develop tools that support monitoring activities and improve efficiency. Forecasting models enable the prediction of pollution levels and facilitate the implementation of action plans. In this study, the Boosted Regression Trees method was employed to investigate the variables influencing water quality failures (WQFs) due to microbial contamination at the delivery point. The dataset used was obtained from localities across the country's distribution systems. The variables under consideration included physicochemical parameters such as pH, turbidity (NTU), and free chlorine (mg L−1), along with contextual parameters like the year, season, geographic location, and locality population. Indicators of microbial contamination assessed were the presence of total coliforms, Escherichia coli, and Pseudomonas aeruginosa. The most significant variables were geographic location, free chlorine content, and the population of the locality. The model achieved an AUC value of 0.77 and provided adequate predictions in the conducted tests. It enables the exploration of key factors affecting microbiological water quality, allowing for informed action to reduce associated risks.