Posterior-thresholding feature extraction for paralinguistic speech classification
作者:
Highlights:
•
摘要
The standard approach for handling computational paralinguistic speech tasks is to extract several thousand utterance-level features from the speech excerpts, and use machine learning methods such as Support Vector Machines and Deep Neural Networks (DNNs) for the actual classification task. In contrast, Automatic Speech Recognition handles the speech signal in small, equal-sized parts called frames. Although the speech community has developed techniques for efficient frame classification, these efforts have mostly been ignored in computational paralinguistics. In this study we propose a simple, three-step technique to utilize frame-level DNN training know-how in computational paralinguistics. We show that this method by itself provides good accuracy scores, and by combining it with the standard paralinguistic classification approach, we get close to the performance of heavyweight, state-of-the-art techniques such as Fisher vector analysis. However, our approach has the advantage that it can be easily realized by using standard speech recognition tools. To demonstrate the generic applicability of this three-step method proposed, we performed our experiments on four different corpora containing different paralinguistic tasks. Overall, we were able to achieve improvements over the baseline score in all four cases, leading to relative error reductions of up to 19%.
论文关键词:Speech processing,Computational paralinguistics,Deep Neural Networks,Feature extraction,Classifier combination
论文评审过程:Received 24 October 2018, Revised 7 August 2019, Accepted 11 August 2019, Available online 16 August 2019, Version of Record 5 November 2019.
论文官网地址:https://doi.org/10.1016/j.knosys.2019.104943