Principal component analysis of speech spectrogram images

作者:

Highlights:

摘要

Recent research has demonstrated that spectrograms containing human speech utterances can be analyzed using image processing techniques to yield a high recognition rate. In particular, Fourier descriptors (FDs) have been proved very useful for characterizing the boundary of segmented isolated words containing the English semivowels /w/, /y/, /1/, and /r/. This study examines the appropriateness of FDs combined with 17 other general features for classifying objects contained in binary spectrogram images. Principal components (PCs) are used for feature reduction on a speaker-dependent data set consisting of 80 sounds representing 20 speakerdependent words containing English semivowels. With only eight features, including four 32-point FDs and four general features obtained from principal component analysis, a 97.5% recognition rate was obtained. © 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.

论文关键词:Principal components,Karhunen-Loeve transform,Fourier descriptors,Cluster analysis,Speech spectrogram

论文评审过程:Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(96)00103-3