Benchmarking Tree-Based Artificial Intelligence Models for Multi-Resolution Solar Irradiance Forecasting Across Various Sky Conditions in Arid Climates
Hasanain A. H. Al-Hilfi, Farhad Shahnia, Seyit Alperen Celtek, Amirmehdi Yazdani, Hai WangIntegrating solar power into electricity grids requires accurate short-term forecasting of the global horizontal irradiance to accurately predict the expected solar power generation. This paper compares five tree-based machine learning models against a Persistence baseline for multi-resolution forecasting in arid climates. A 13-year dataset from Basra, Iraq, has been employed in this study for verification purposes, and the models are tested across various very-short- to short-term forecasting horizons of 5, 10, 15, 30, and 60 min. Unlike most existing studies that focus on single forecasting horizons or mixed climatic conditions, this work systematically benchmarks multi-resolution irradiance forecasting under distinct sky conditions in a hot arid environment using a strict anti-data-leakage framework. To avoid data leakage in these models, feature engineering has used only lagged inputs. The dataset has been split into three groups for training, validation, and testing (respectively 70, 15, and 15% of the entire available dataset). The models were then tested separately under clear, partly cloudy, and cloudy skies. Numerical studies prove that picking the best model depends heavily on the forecast horizon. For very-short-term predictions, the Persistence model was competitive (RMSE = 21.32 W/m2), while the Gradient Boosting model proved slightly more accurate (RMSE = 17.65 W/m2). For the 60 min horizon, the boosting models took a clear lead. The HistGradientBoosting model resulted in a 67% reduction in the RMSE compared to the Persistence baseline. Also, the top-performing model changed depending on the weather and the time scale. Gradient Boosting was the clear winner for short-term clear sky forecasts, while XGBoost handled the longer horizons. Partly cloudy skies showed a rotating mix of different boosting algorithms taking the lead. However, studies show that when skies were fully overcast, complex machine learning models fail to capture chaotic patterns, making the simple Persistence baseline a necessary reliability safeguard. The results reveal that no single model consistently dominates all forecasting horizons and weather conditions, highlighting the necessity of adaptive model selection for operational solar forecasting. These findings highlight the importance of horizon- and weather-adaptive model selection for operational solar forecasting. Rather than relying on a single universal algorithm, grid operators in arid regions can improve forecasting reliability by dynamically selecting models based on prevailing sky conditions and forecast horizons.