Multi-Source Meteorological–Topographic Modeling of Monthly Power Generation for Mountain Photovoltaic Stations Using Gradient-Boosted Trees
Pengjie Sun, Ming Wang, Dan Meng, Yang Xu, Chi Cheng, Wei JuMountain photovoltaic (PV) stations are increasingly deployed in complex terrain, where generation is jointly controlled by solar-resource variability, near-surface meteorology, and local topography. However, the quantitative contribution of topographic factors to regional-scale PV generation remains insufficiently evaluated, and many prediction studies rely on single-station or short-term records. In this study, monthly measured generation from 118 standardized village-level mountain PV stations in Badong County, western Hubei Province, China (2019–2021), was integrated with Solargis Global Horizontal Irradiance (GHI)-related solar-resource data, high-resolution gridded meteorological data, a 25 m digital elevation model, seasonal-cycle variables, and historical-generation features. After seasonally grouped median-absolute-deviation (MAD) outlier screening, GIS-based spatial matching, terrain extraction, and viewshed-derived shading analysis, regression models and climatology baselines were compared under both chronological validation and station-exclusion spatial cross-validation. Under the strict chronological validation, CatBoost achieved the best temporal performance among the tested models (R2 = 0.3119, MAE = 2719.7 kWh, RMSE = 3245.6 kWh), slightly outperforming the monthly climatology baseline. In the station-exclusion spatial cross-validation, XGBoost achieved the highest mean R2 (0.8659), indicating good spatial transferability to unseen stations. Correlation and partial-correlation analyses showed that the temperature-related variable group and monthly radiation were the dominant meteorological controls, whereas elevation, slope, and terrain shading showed weak direct correlations with monthly generation for already-sited stations. Annual 90% prediction intervals were further estimated using residual bootstrapping, with an empirical coverage of 94.9%. The proposed framework provides a practical basis for monthly generation forecasting and operational assessment of already-built distributed PV stations in mountainous regions, while its application to greenfield site selection requires additional site engineering and near-field obstruction information.