A comparison of system architectures for intelligent document understanding

作者:

Highlights:

摘要

Intelligent document understanding (IDU) is the process of converting scanned document images into a high level representation which describes the document's layout and logical structure, in addition to providing its information content. In this paper we discuss IDU in general and address a specific problem within this domain concerning the extraction of the layout structure of pages from a technical journal. Three different architectural approaches to accomplishing this task are proposed. Firstly we describe a novel document understanding system (System A) which exploits a hybrid bottom-up/top-down control architecture. The system uses a variety of image processing algorithms in a bottom-up manner. Conversely, a system based on a pure top-down architecture (System B) is then proposed which produces a segmentation of the page via projection profile analysis and achieves classification of image regions via procedural deduction. Finally, an alternative top-down architecture (System C) is described in which an optimised segmentation scheme is applied to produce partitioned blocks. These are then classified in a goal driven manner using a decision tree. A comparison of the three systems is made by measuring system performance on images obtained from a specific class of input document. The performance of document understanding systems has been quantified in terms of an object identification rate and the percentage of column area successfully interpreted. Using these measures, System A has given superior results to the two top-down systems presented. System A also performs significantly better than a previously reported top-down system operating on a comparable problem (Viswanathan, 1990).

论文关键词:Document understanding,Page layout analysis,Document image processing,ODA

论文评审过程:Received 15 December 1994, Available online 19 May 1998.

论文官网地址:https://doi.org/10.1016/S0923-5965(96)00002-1