Connected and degraded text recognition using hidden Markov model

作者:

Highlights:

摘要

We have applied a Hidden Markov Model (HMM) and level-building dynamic programming algorithm to the problem of robust machine recognition of connected and degraded characters forming words in a poorly printed text. The recognition system consists of preprocessing, subcharacter segmentation and feature extraction, followed by supervised learning or recognition. A structural analysis algorithm is used to segment a word into subcharacter segments irrespective of the character boundaries, and to identify the primitive features in each segment such as strokes and arcs. The states of the HMM for each character are statistically represented by the subcharacter segments, and the state characteristics are obtained by determining the state probability functions based on the training samples. In order to recognize an unknown word, subcharacter segmentation and feature extraction are performed and the transition probabilities between character models are used for the transition between characters in the string. A level-building dynamic programming algorithm combines segmentation and recognition of the word in one operation and chooses the best probable grouping of characters for recognition of an unknown word. The computer experiments demonstrate the robustness and effectiveness of the new system for recognizing words formed by degraded and connected characters.

论文关键词:Connected and degraded character recognition,Word shape analysis,Hidden Markov Model,Statistical and structural character recognition,Viterbi algorithm,Dynamic programming,Segmentation

论文评审过程:Received 23 May 1993, Revised 8 February 1994, Accepted 24 February 1994, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(94)90069-8