Proper noun detection in document images

作者:

Highlights:

摘要

An algorithm for the detection of proper nouns in document images printed in mixed upper and lower case is presented. Analysis of graphical features of words in a running text is performed to determine words that are likely to be names of specific persons, places, or objects (i.e. proper nouns). This algorithm is a useful addition to contextual post-processing (CPP) or whole word recognition techniques where word images are matched to entries in a dictionary. Due to the difficulty of creating a comprehensive list of proper nouns, a methodology of locating such words prior to recognition will allow for the use of specialized recognition strategies for those words only. Experimental results demonstrate that about 90% of all occurrences of proper nouns were located and over 97% of the unique proper nouns in a document were found using this algorithm.

论文关键词:Proper noun detection,Character recognition,Word recognition,Feature extraction,Capitalized word detection,Nearest neighbor classifier

论文评审过程:Received 10 December 1992, Revised 20 September 1993, Accepted 5 October 1993, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(94)90062-0