An Overview of Speech Enhancement Based on Deep Learning Techniques

doi:10.1142/s0219467825500019

Chaitanya Jannu, Sunny Dayal Vanambathina

An Overview of Speech Enhancement Based on Deep Learning Techniques

Computer Graphics and Computer-Aided Design
Computer Science Applications
Computer Vision and Pattern Recognition

Recent years have seen a significant amount of studies in the area of speech enhancement. This review looks at several speech improvement methods as well as Deep Neural Network (DNN) functions in speech enhancement. Speech transmissions are frequently distorted by ambient noise, background noise, and reverberations. There are processing methods, such as Short-time Fourier Transform, Short-time Autocorrelation, and Short-time Energy (STE), that can be used to enhance speech. To reduce speech noise, features such as the Mel-Frequency Cepstral Coefficients (MFCCs), Logarithmic Power Spectrum (LPS), and Gammatone Frequency Cepstral Coefficients (GFCCs) can be retrieved and input to a DNN. DNN is essential to speech improvement since it builds models using a lot of training data and evaluates the efficacy of the enhanced speech using certain performance metrics. Since the beginning of deep learning publications in 1993, a variety of speech enhancement methods have been examined in this study. This review provides a thorough examination of the several neural network topologies, training algorithms, activation functions, training targets, acoustic features, and databases that were employed for the job of speech enhancement and were gathered from various articles published between 1993 and 2022.

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

Web-based, modern reference management
Collaborate and share with fellow researchers
Integration with Overleaf
Comprehensive BibTeX/BibLaTeX support
Save articles and websites directly from your browser
Search for new articles from a database of tens of millions of references

Try out CiteDrive

An Overview of Speech Enhancement Based on Deep Learning Techniques

Need a simple solution for managing your BibTeX entries? Explore CiteDrive!

More from our Archive

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models

3D Gaussian Splatting for Real-Time Radiance Field Rendering

HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

AvatarReX: Real-time Expressive Full-body Avatars

Random-Access Neural Compression of Material Textures

Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models

MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes

Quantitative CT Metrics Associated with Variability in the Diffusion Capacity of the Lung of Post-COVID-19 Patients with Minimal Residual Lung Lesions

Recursive Control Variates for Inverse Rendering