Principal component analysis of speech spectrogram images
作者:
Highlights:
•
摘要
Recent research has demonstrated that spectrograms containing human speech utterances can be analyzed using image processing techniques to yield a high recognition rate. In particular, Fourier descriptors (FDs) have been proved very useful for characterizing the boundary of segmented isolated words containing the English semivowels /w/, /y/, /1/, and /r/. This study examines the appropriateness of FDs combined with 17 other general features for classifying objects contained in binary spectrogram images. Principal components (PCs) are used for feature reduction on a speaker-dependent data set consisting of 80 sounds representing 20 speakerdependent words containing English semivowels. With only eight features, including four 32-point FDs and four general features obtained from principal component analysis, a 97.5% recognition rate was obtained. © 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.
论文关键词:Principal components,Karhunen-Loeve transform,Fourier descriptors,Cluster analysis,Speech spectrogram
论文评审过程:Available online 7 June 2001.
论文官网地址:https://doi.org/10.1016/S0031-3203(96)00103-3