Using Shape and Layout Information to Find Signatures, Text, and Graphics
作者:
Highlights:
•
摘要
The decomposition of a page image into text and various types of nontext elements is a challenging problem important in document analysis problems such as optical character recognition, storage and retrieval, and identification of the sender and recipient of a FAX. A fast classifier based on a skeletonization of the image attempts to classify groups of related line segments as text, ruling lines, signatures, other line art, or miscellaneous items. Then everything classified as text is processed by Baird's language-free layout analysis system so that a postprocessor can use the geometric layout to refine decisions about what is text and what is nontext. This could then be further processed to identify complex objects such as tables, signature blocks, and line drawings. In order to recognize signatures and to separate them from ruling lines and components of line drawings, line segments from skeletonization need to be strung together by a curve-fitting process. After long, fairly straight lines are found and set aside, a more lenient criterion strings together pairs of segments to form the groups on which to run the fast classifier.
论文关键词:
论文评审过程:Received 16 November 1999, Accepted 16 June 2000, Available online 26 March 2002.
论文官网地址:https://doi.org/10.1006/cviu.2000.0868