On the Role of Feature Extraction in Transformer PD Severity Classification: A Controlled Comparison of PCA and Autoencoder Models
Lucas Thobejane, Bonginkosi ThangoThis paper applies the comparative PCA-ANN vs. Autoencoder-ANN framework to transformer partial discharge (PD) severity classification, using a 294-sample dataset spanning four severity classes: Normal, Low PD, Medium PD, and High PD. Two raw measurements of discharge magnitude (pC) and applied voltage (kV) are expanded into a 15-dimensional physics-informed feature space. Both linear (PCA) and nonlinear (bottleneck Autoencoder) feature extraction are evaluated exhaustively across all latent dimensions k = 1–15, feeding an identical ANN classifier. PCA + ANN achieves perfect test accuracy of 100.0% at k = 9, while Autoencoder + ANN achieves 98.3% at k = 8. PCA + ANN demonstrates superior performance on this dataset, attributed to the low intrinsic dimensionality of the two-measurement PD feature space and the highly separable nature of PD severity classes in the engineered ratio feature space. The Autoencoder provides a more compact latent representation but introduces classification errors for the Normal class due to its extreme under-representation. Cross-validation confirms PCA + ANN stability (97.4 ± 0.9% vs. 97.0 ± 1.0%). These results, alongside the companion DGA study, provide the complete baseline for comparing linear and nonlinear feature extraction across two transformer diagnostic modalities.