DOI: 10.1177/14727978251341494 ISSN: 1472-7978

Research on the regional classification of folk songs integrating CNN and transformer

Xueqing Huang, Na Long, Junxin Zhang, Tianling Li, FengLing Wang, Lei Liao

Folk songs are a vital component of regional culture, distinguished by distinctive local traits and ethnic charm. Conventional classification techniques are insufficient for tackling the diversity and complexity of dialects in folk music. A new model integrating convolutional neural networks (CNNs) and Transformers—TransCNN—has been developed to address this problem. This approach innovatively extracts local information from songs using CNN, thereafter encoding and decoding these features globally through the self-attention mechanism of Transformers. Furthermore, it employs diverse data augmentation techniques to strengthen the model’s capacity to identify the attributes of folk song dialects and elevate its overall efficacy. The evaluations revealed that TransCNN attained a classification accuracy of 92.3%, a recall rate of 91.8%, and F1 scores of 91.9%. The findings demonstrate that TransCNN effectively manages various audio kinds, background noises, and tonal variations, fulfilling the accuracy and reliability requirements for categorizing folk songs in the specified job.

More from our Archive