A formal theory for optimal and information theoretic syntactic pattern recognition

作者:

Highlights:

摘要

In this paper we present a foundational basis for optimal and information theoretic syntactic pattern recognition. We do this by developing a rigorous model, m∗, for channels which permit arbitrarily distributed substitution, deletion and insertion syntactic errors. More explicitly, if A is any finite alphabet and A∗ the set of words over A, we specify a stochastically consistent scheme by which a string U∈A∗ can be transformed into any Y∈A∗ by means of arbitrarily distributed substitution, deletion and insertion operations. The scheme is shown to be functionally complete and stochastically consistent. Apart from the synthesis aspects, we also deal with the analysis of such a model and derive a technique by which Pr[Y∣U], the probability of receiving Y given that U was transmitted, can be computed in cubic time using dynamic programming. One of the salient features of this scheme is that it demonstrates how dynamic programming can be applied to evaluate quantities involving complex combinatorial expressions and which also maintain rigid probability consistency constraints. Experimental results which involve dictionaries with strings of lengths between 7 and 14 with an overall average noise of 39.75% demonstrate the superiority of our system over existing methods. Apart from its straightforward applications in string generation and recognition, we believe that the model also has extensive potential applications in speech, uni-dimensional signal processing, and computational molecular biology.

论文关键词:Syntactic pattern recognition,Optimal and information theoretic pattern recognition String generation,Unigram and bigram model,Sequence generation,Markovian sequence generation

论文评审过程:Received 4 March 1997, Revised 1 October 1997, Available online 22 October 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(97)00124-6