DOI: 10.3397/1/377122 ISSN:

A new speech enhancement method based on Swin-UNet model

Chengli Sun, Weiqi Jiang, Yan Leng, Feilong Chen
  • Industrial and Manufacturing Engineering
  • Public Health, Environmental and Occupational Health
  • Mechanical Engineering
  • Acoustics and Ultrasonics
  • Aerospace Engineering
  • Automotive Engineering
  • Building and Construction

U-shaped Network (UNet) has shown excellent performance in a variety of speech enhancement tasks. However, because of the intrinsic limitation of convolutional operation, traditional UNet built with convolutional neural network (CNN) cannot learn global and long-term information well. In this work, we propose a new Swin-UNet-based speech enhancement method. Unlike the traditional UNet model, the CNN blocks are all replaced with Swin-Transformer blocks to explore more multi-scale contextual information. The Swin-UNet model employs shifted window mechanism which not only overcomes the defect of high computational complexity of the Transformer but also enhances global information interaction by utilizing the powerful global modeling capability of the Transformer. Through hierarchical Swin-Transformer blocks, global and local speech features can be fully leveraged to improve speech reconstruction ability. Experimental results confirm that the proposed method can eliminate more background noise while maintaining good objective speech quality.

More from our Archive