Semantic information extraction from images of complex documents

作者：Claudio Antonio Peanho, Henrique Stagni, Flavio Soares Correa da Silva

摘要

Even though the digital processing of documents is increasingly widespread in industry, printed documents are still largely in use. In order to process electronically the contents of printed documents, information must be extracted from digital images of documents. When dealing with complex documents, in which the contents of different regions and fields can be highly heterogeneous with respect to layout, printing quality and the utilization of fonts and typing standards, the reconstruction of the contents of documents from digital images can be a difficult problem. In the present article we present an efficient solution for this problem, in which the semantic contents of fields in a complex document are extracted from a digital image.

论文关键词：Document image processing, Information extraction from documents

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-012-0348-x