Neural network vowel-recognition jointly using voice features and mouth shape image
作者:
Highlights:
•
摘要
This paper describes a neural approach intended to improve the performance of an automatic speech recognition system for unrestricted speakers by using not only voice sound features but also image features of the mouth shape. In particular, we used the natural sample voice signals and mouth shape images that were acquired in the general environment, neither in the sound isolation room nor under specific lighting conditions. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray level image, binary image and geometrical shape features of the mouth were used as the compensatory information, and compared which kinds of image features are effective. For unrestricted speakers, a vowel recognition rate of about 80% was obtained using only voice features, but this increased to some 92% when voice features plus binary images were used. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.
论文关键词:Voice recognition,Neural network,Image processing,Lip-reading,Mouth shape,Binary image features
论文评审过程:Received 6 March 1990, Revised 12 December 1990, Accepted 25 February 1991, Available online 19 May 2003.
论文官网地址:https://doi.org/10.1016/0031-3203(91)90089-N