Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation
作者:
Highlights:
• Innovative neural-based framework for speaker activity and position detection.
• Exploitation of multiple audio features for improved speech detection.
• Ad-hoc cascaded CNN architectures for real-world speech detection and localization.
• Acoustic scene simulation for training data augmentation to enhance the performance.
• Wide experimental evaluation and relevant improvement with respect to state of art.
摘要
•Innovative neural-based framework for speaker activity and position detection.•Exploitation of multiple audio features for improved speech detection.•Ad-hoc cascaded CNN architectures for real-world speech detection and localization.•Acoustic scene simulation for training data augmentation to enhance the performance.•Wide experimental evaluation and relevant improvement with respect to state of art.
论文关键词:Voice activity detection,Speaker localization,Data augmentation,Multi-room environment,Deep learning
论文评审过程:Received 29 January 2019, Revised 30 April 2019, Accepted 13 May 2019, Available online 16 May 2019, Version of Record 4 June 2019.
论文官网地址:https://doi.org/10.1016/j.eswa.2019.05.017