A comparative study of CNN and vision transformer architectures for fault detection in small wind turbine blades

doi:10.58559/ijes.1936625

DOI: 10.58559/ijes.1936625 ISSN: 2717-7513

A comparative study of CNN and vision transformer architectures for fault detection in small wind turbine blades

Muhammet Fatih Aslan, Büşra Aslan, Selami Balcı

Automated fault detection in wind turbine blades is critical for ensuring the reliability and operational efficiency of wind energy systems. This study presents a systematic comparative analysis of five state-of-the-art deep learning architectures for binary fault classification (healthy versus faulty) on the CAI-SWTB dataset, comprising 6,000 RGB images of small wind turbine blades. The evaluated architectures span three distinct design paradigms: a classical Convolutional Neural Network (CNN) (ResNet-50), a modern CNN (EfficientNetV2-S), and three Vision Transformer (ViT)-based models (ViT-B/16, Data-efficient Image Transformer (DeiT)-S/16, and Swin-Tiny). All models were trained using a two-stage transfer learning protocol with ImageNet-pretrained weights, employing the AdamW optimizer and a cosine annealing learning rate schedule. EfficientNetV2-S achieved the highest classification performance with 99.75% accuracy, followed by ResNet-50 at 99.42%. Among the transformer-based models, Swin-Tiny outperformed both ViT-B/16 (65.33%) and DeiT-S/16 (82.75%), achieving 88.25% accuracy. Grad-CAM analysis confirmed that the best-performing models correctly localize structural defect regions in blade images, supporting their interpretability and suitability for real-world inspection applications.

Outline

A comparative study of CNN and vision transformer architectures for fault detection in small wind turbine blades

More from our Archive