A Human Auditory Perception Loss Function Using Modified Bark Spectral Distortion for Speech Enhancement

作者:Xiaofeng Shu, Yi Zhou, Hongqing Liu, Trieu-Kien Truong

摘要

Human listeners often have difficulties understanding speech in the presence of background noise in daily speech communication environments. Recently, deep neural network (DNN)-based techniques have been successfully applied to speech enhancement and achieved significant improvements over the conventional approaches. However, existing DNN-based methods usually minimize the log-power spectral-based or the masking-based mean squared error (MSE) between the enhanced output and the training target (e.g., the ideal ratio mask (IRM) of the clean speech), which is not closely related to human auditory perception. In this letter, a modified bark spectral distortion loss function, which can be considered as an auditory perception-based MSE, is proposed to replace the conventional MSE in DNN-based speech enhancement approaches to further improve the objective perceptual quality. Experimental results reveal that the proposed method can obtain improved speech enhancement performance, especially in terms of objective perceptual quality in all experimental settings when compared with the DNN-based methods using the conventional MSE criterion.

论文关键词:Speech enhancement, Loss function, MSE, MBSD, DNN

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-020-10212-z