Physics-Informed Validation of an XGBoost Decision Layer for SCADA-Based Wind Turbine Anomaly Detection
Shawn Aranda Nyamato, Mwana Wa Kalaga Mbukani, Lebogang MasikeThe supervisory control and data acquisition (SCADA) data are increasingly used for wind turbine anomaly detection, but purely data-driven methods may be limited by weak physical interpretability, class imbalance, and reduced generalization under changing wind-farm operating conditions. Although the Extreme Gradient Boosting (XGBoost) is effective for structured nonlinear classification, its use in SCADA-based anomaly detection remains affected by label quality, probability calibration, and cross-farm transferability. This paper validates a physics-informed XGBoost decision layer using residual-based indicators, including power-curve residuals, gearbox and generator thermal residuals, rotor-speed variance, active-power ratio, and wind-speed fluctuation. Comprehensive Anomaly Detection Benchmark for Wind Turbine SCADA Data (CARE) logbook labels are used as the reference labels, while 2σ, 3σ, and 4σ residual thresholds are evaluated as competing rule-based detectors. The decision layer is trained and internally tested using event-grouped chronological splits from Wind Farm A and externally evaluated on unseen Wind Farms B and C. The results show physically interpretable anomaly detection behavior, although performance varies across validation settings. Under external Farm A to Farm B/C transfer, XGBoost achieved row-level F1-scores of 0.6296 and 0.6551, respectively. Shapley additive explanations (SHAPs) link anomaly predictions mainly to thermal, power-conversion, and operating-context features. The findings support the proposed decision layer as an interpretable benchmark-validation framework, while showing that additional maintenance-log validation is required before definitive component-level fault-diagnosis claims can be made.