Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation

作者:

Highlights:

• Innovative neural-based framework for speaker activity and position detection.

• Exploitation of multiple audio features for improved speech detection.

• Ad-hoc cascaded CNN architectures for real-world speech detection and localization.

• Acoustic scene simulation for training data augmentation to enhance the performance.

• Wide experimental evaluation and relevant improvement with respect to state of art.

摘要

•Innovative neural-based framework for speaker activity and position detection.•Exploitation of multiple audio features for improved speech detection.•Ad-hoc cascaded CNN architectures for real-world speech detection and localization.•Acoustic scene simulation for training data augmentation to enhance the performance.•Wide experimental evaluation and relevant improvement with respect to state of art.

论文关键词:Voice activity detection,Speaker localization,Data augmentation,Multi-room environment,Deep learning

论文评审过程:Received 29 January 2019, Revised 30 April 2019, Accepted 13 May 2019, Available online 16 May 2019, Version of Record 4 June 2019.

论文官网地址:https://doi.org/10.1016/j.eswa.2019.05.017