Self-labeling with feature transfer for speech emotion recognition
作者:
Highlights:
•
摘要
Most speech emotion recognition methods based on frames have obtained good results in many applications. However, they segment each speech sample into smaller frames that are labeled with the same emotional tag as that of the speech sample. This is inconsistent with the possibility of a speech sample containing several emotional categories at the same time. Thus, this paper proposes a self-labeling (SL) learning method for speech emotion recognition, which automatically segments each speech sample into frames and then labels them with the corresponding emotional tags, where the compatibility of these tags is also checked. Then, a time–frequency deep neural network for speech emotion recognition is designed and trained. As most speech emotion datasets are very small, the feature transfer model is applied to further enhance the performance of the SL learning method, which is trained on large-scale audio data. Experimental results on various datasets demonstrate the effectiveness of the proposed method.
论文关键词:Speech emotion recognition,Deep neural network,Self-labeled,Speech frame,Transfer learning
论文评审过程:Received 17 April 2022, Revised 28 July 2022, Accepted 30 July 2022, Available online 5 August 2022, Version of Record 28 August 2022.
论文官网地址:https://doi.org/10.1016/j.knosys.2022.109589