DOI: 10.1002/joc.70489 ISSN: 0899-8418

Solar Radiation Forecasting in Sparse‐Observation Regions Using Long‐Term Reanalysis Data and Leakage‐Aware Boosting

Mehmet Ali Çelik, Ishak Pacal

ABSTRACT

Reliable daily solar radiation forecasts are required for photovoltaic planning, renewable energy scheduling, and climate‐sensitive decision support in regions where long‐term ground observations are limited. This study developed a leakage‐aware machine learning approach for end of day, next‐day solar radiation forecasting using ERA5‐derived daily meteorological records from Aralık, Iğdır and Tuzluca in eastern Türkiye. The dataset covered 1950–2024 and included 82,176 daily observations. The predictor set combined calendar descriptors, lagged meteorological variables, shifted rolling‐window statistics, short‐term difference features and district information. An evapotranspiration‐related variable was excluded from the main experiments because it showed target‐proximal behaviour and could inflate predictive skill. Six models were first compared under the chronological hold‐out protocol: Persistence, Ridge Regression, Random Forest, XGBoost, LightGBM and CatBoost. The leading boosting models were then examined in rolling‐origin temporal evaluation and leave‐one‐location‐out spatial validation, with Persistence retained as the reference baseline. CatBoost obtained the best chronological test performance, with an RMSE of 2.4733 MJ m −2  day −1 and an R 2 of 0.8808, corresponding to an approximately 34% RMSE reduction relative to Persistence. It also gave the most stable results across rolling‐origin test periods. LightGBM provided the best transfer to unseen districts, with a mean leave‐one‐location‐out RMSE of 2.2655 MJ m −2  day −1 and a mean R 2 of 0.8945. Ablation and SHAP analyses showed that seasonal position, recent solar radiation, rolling solar regimes, and short‐term atmospheric changes were the dominant sources of predictive information.

More from our Archive