DOI: 10.3390/electronics15132907 ISSN: 2079-9292

Automated Labeling Procedure for Wind Turbine SCADA Data with Iterative Refinement and Model-Based Validation

Fatima Ez-Zahiri, Xiaoqiang Guo, Damian Grzechca

Effective fault detection is essential for maximizing energy production and ensuring the safe operation of wind turbines. However, supervised AI models for Supervisory Control and Data Acquisition (SCADA)-based condition monitoring are often limited by the lack of reliably labeled datasets. To address this issue, this manuscript proposes an Automated Labeling Procedure (ALP) that generates a structured and reliable labeled dataset from initially unlabeled wind turbine SCADA data. The proposed ALP integrates initial labeling, preprocessing, feature selection, class balancing, model-based validation, selective relabeling, and iterative retraining. A documented gearbox-changeout interval serves as the initial fault-related reference period. A Grid Search-optimized Decision Tree (GS-DT) is employed as the main validation model, while Random Forest and XGBoost are used for comparison. The main contribution is a novel misclassification-guided refinement loop in which disagreement between provisional labels and model predictions is analyzed using their SCADA values, timestamps, and relation to the fault reference interval before any selective relabeling is performed. The results show that the ALP reduces the labeling task to a small set of disputed samples requiring manual verification, instead of reviewing the entire dataset. Through iterative relabeling and retraining, dataset consistency improved and results became stable across models. Overall, the findings demonstrate the suitability of the refined dataset for subsequent wind turbine fault-detection applications.

More from our Archive