Wavelet-Enhanced Machine Learning for Seawater Alkalinity Prediction in the Arabian Gulf Using Monitored Water-Quality Variables
Saleh H. Alhathloul, Yazeed AlgurainyContinuous monitoring of seawater alkalinity is essential for maintaining chemical stability in coastal environments and supporting efficient operation of desalination and water-treatment systems; however, direct alkalinity measurements are often limited in temporal resolution. This study develops and evaluates a machine learning framework for estimating seawater alkalinity using quality-controlled daily and sub-daily water-quality observations collected from a coastal monitoring station along the Arabian Gulf coast of eastern Saudi Arabia during 2017–2023. Five machine learning models, Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), Support Vector Regression (SVR), and K-Nearest Neighbors (KNN), are assessed under two configurations: a baseline setup relying on the original predictor variables and an enhanced setup incorporating wavelet-decomposed features to represent multiscale temporal variability. Model performance is evaluated using five-fold cross-validation and quantified using R2, root mean square error (RMSE), and mean absolute error (MAE). Under the baseline configuration, ensemble-based models outperform single-estimator and distance-based approaches, with RF achieving the best performance (R2 = 0.77, RMSE = 2.57 ppm, MAE = 1.71 ppm). The incorporation of wavelet-based feature enrichment leads to consistent performance improvements across all models, reflected by higher R2 values and reduced RMSE and MAE. The wavelet-enhanced RF model exhibits the strongest overall performance, attaining a mean R2 of approximately 0.91 together with an RMSE of about 1.6 ppm and an MAE of around 1.0 ppm, while also showing reduced variability across cross-validation folds. The XGB model shows notable improvement with wavelet enrichment, whereas SVR and KNN benefit mainly through moderate error reduction. Overall, the findings show that wavelet-based feature enrichment improves the accuracy and stability of ML models for seawater alkalinity estimation, with RF providing the most reliable performance for coastal monitoring applications.