DNN-Based Cross-Lingual Voice Conversion Using Bottleneck Features

作者：M. Kiran Reddy, K. Sreenivasa Rao

摘要

Cross-lingual voice conversion (CLVC) is quite challenging since the source and target speakers speak different languages. It is essential for various applications such as developing mixed-language speech synthesis systems, customization of speaking devices, etc. This paper proposes a deep neural network (DNN)-based approach utilizing bottleneck features for CLVC. In the proposed method, the speaker-independent information present in the speech signals from different languages is represented by using the bottleneck features extracted from a deep auto-encoder. A DNN model is trained to learn the mapping between bottleneck features and the corresponding spectral features of the target speaker. The proposed approach can capture speaker-specific characteristics of a target speaker, and requires no speech data from the source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method can effectively convert the source speaker voice to target speaker voice in a cross-lingual scenario.

论文关键词：Cross-lingual voice conversion, Deep autoencoder, Deep neural network, Gaussian mixture model

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11063-019-10149-y