Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation

作者：

Highlights：

• Innovative neural-based framework for speaker activity and position detection.

• Exploitation of multiple audio features for improved speech detection.

• Ad-hoc cascaded CNN architectures for real-world speech detection and localization.

• Acoustic scene simulation for training data augmentation to enhance the performance.

• Wide experimental evaluation and relevant improvement with respect to state of art.

摘要

•Innovative neural-based framework for speaker activity and position detection.•Exploitation of multiple audio features for improved speech detection.•Ad-hoc cascaded CNN architectures for real-world speech detection and localization.•Acoustic scene simulation for training data augmentation to enhance the performance.•Wide experimental evaluation and relevant improvement with respect to state of art.

论文关键词：Voice activity detection,Speaker localization,Data augmentation,Multi-room environment,Deep learning

论文评审过程：Received 29 January 2019, Revised 30 April 2019, Accepted 13 May 2019, Available online 16 May 2019, Version of Record 4 June 2019.

论文官网地址：https://doi.org/10.1016/j.eswa.2019.05.017