ORVAE: One-Class Residual Variational Autoencoder for Voice Activity Detection in Noisy Environment

作者:Hasam Khalid, Shahroz Tariq, TaeSoo Kim, Jong Hwan Ko, Simon S. Woo

摘要

Detecting human speech is foundational for a wide range of emerging intelligent applications. However, accurately detecting human speech is challenging, especially in the presence of unknown noise patterns. Generally, deep learning-based methods have shown to be more robust and accurate than statistical methods and other existing approaches. However, typically creating a noise-robust and more generalized deep learning-based voice activity detection system requires the collection of an enormous amount of annotated audio data. In this work, we develop a generalized model trained on limited types of human speeches with noisy backgrounds. Yet, it can detect human speech in the presence of various unseen noise types, which were not present in the training set. To achieve this, we propose a one-class residual connections-based variational autoencoder (ORVAE), which only requires a limited number of human speech data with noisy background for training, thereby eliminating the need for collecting data with diverse noise patterns. Evaluating ORVAE with three different datasets (synthesized TIMIT and NOISEX-92, synthesized LibriSpeech and NOISEX-92, and a Publicly Recorded dataset), our method outperforms other one-class baseline methods, achieving \(F_1\)-scores of over \(90\%\) for multiple signal-to-noise ratio levels.

论文关键词:Variational autoencoder, One-class classification, Voice activity detection

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-021-10695-4