Singer identification based on computational auditory scene analysis and missing feature methods

作者:Ying Hu, Guizhong Liu

摘要

A major challenge for the identification of singers from monaural popular music recording is to remove or alleviate the influence of accompaniments. Our system is realized in two stages. In the first stage, we exploit computational auditory scene analysis (CASA) to segregate the singing voice units from a mixture signal. First, the pitch of singing voice is estimated to extract the pitch-based features of each unit in an acoustic vector. These features are then exploited to estimate the binary time-frequency (T-F) masks, where 1 indicates that the corresponding T-F unit is dominated by the singing voice, and 0 indicates otherwise. These regions dominated by the singing voice are considered reliable, and other units are unreliable or missing. Thus the acoustic vector is incomplete. In the second stage, two missing feature methods, the reconstruction of acoustic vector and the marginalization, are used to identify the singer by dealing with the incomplete acoustic vectors. For the reconstruction of acoustic vector, the complete acoustic vector is first reconstructed and then converted to obtain the Gammatone frequency cepstral coefficients (GFCCs), which are further used to identify the singer. For the marginalization, the probabilities that the voice belonging to a certain singer are computed on the basis of only the reliable components. We find that the reconstruction method outperforms the marginalization method, while both methods have significantly good performances, especially at signal-to-accompaniment ratios (SARs) of 0 dB and − 3 dB, in contrast to another system.

论文关键词:Computational auditory scene analysis (CASA), Reconstruction, Marginalization, Singer identification

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-013-0271-6