Speaker Recognition Based on the Joint Loss Function

doi:10.3390/electronics12163447

DOI: 10.3390/electronics12163447 ISSN:

Speaker Recognition Based on the Joint Loss Function

Tengteng Feng, Houbin Fan, Fengpei Ge, Shuxin Cao, Chunyan Liang

Electrical and Electronic Engineering
Computer Networks and Communications
Hardware and Architecture
Signal Processing
Control and Systems Engineering

The statistical pyramid dense time-delay neural network (SPD-TDNN) model makes it difficult to deal with the imbalance of training data, poses a high risk of overfitting, and has weak generalization ability. To solve these problems, we propose a method based on the joint loss function and improved statistical pyramid dense time-delay neural network (JLF-ISPD-TDNN), which improves on the SPD-TDNN model and uses the joint loss function method to combine the advantages of the cross-entropy loss function and the comparative learning of the loss function. By minimizing the distance between speech embeddings from the same speaker and maximizing the distance between speech embeddings from different speakers, the model could achieve enhanced generalization performance and more robust speaker feature representation. We evaluated the proposed method’s performance using the evaluation indexes of the equal error rate (EER) and minimum cost function (minDCF). The experimental results show that the EEE and minDCF on the Aishell-1 dataset reached 1.02% and 0.1221%, respectively. Therefore, using the joint loss function in the improved SPD-TDNN model can significantly enhance the model’s speaker recognition performance.

Outline

Speaker Recognition Based on the Joint Loss Function

More from our Archive