Generalizing edit distance to incorporate domain information: Handwritten text recognition as a case study

作者:

Highlights:

摘要

In this paper the Damerau-Levenshtein string difference metric is generalized in two ways to more accurately compensate for the types of errors that are present in the script recognition domain. First, the basic dynamic programming method for computing such a measure is extended to allow for merges, splits and two-letter substitutions. Second, edit operations are refined into categories according to the effect they have on the visual “appearance”of words. A set of recognizer-independent constraints is developed to reflect the severity of the information lost due to each operation. These constraints are solved to assign specific costs to the operations. Experimental results on 2335 corrupted strings and a lexicon of 21, 299 words show higher correcting rates than with the original form.

论文关键词:String distance,tring matching,Spelling error correction,Word recognition and correction,Text editing,Script recognition,Post-processing

论文评审过程:Received 19 January 1995, Revised 30 May 1995, Accepted 3 July 1995, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/0031-3203(95)00102-6