A comparison of system architectures for intelligent document understanding

摘要

Intelligent document understanding (IDU) is the process of converting scanned document images into a high level representation which describes the document's layout and logical structure, in addition to providing its information content. In this paper we discuss IDU in general and address a specific problem within this domain concerning the extraction of the layout structure of pages from a technical journal. Three different architectural approaches to accomplishing this task are proposed. Firstly we describe a novel document understanding system (System A) which exploits a hybrid bottom-up/top-down control architecture. The system uses a variety of image processing algorithms in a bottom-up manner. Conversely, a system based on a pure top-down architecture (System B) is then proposed which produces a segmentation of the page via projection profile analysis and achieves classification of image regions via procedural deduction. Finally, an alternative top-down architecture (System C) is described in which an optimised segmentation scheme is applied to produce partitioned blocks. These are then classified in a goal driven manner using a decision tree. A comparison of the three systems is made by measuring system performance on images obtained from a specific class of input document. The performance of document understanding systems has been quantified in terms of an object identification rate and the percentage of column area successfully interpreted. Using these measures, System A has given superior results to the two top-down systems presented. System A also performs significantly better than a previously reported top-down system operating on a comparable problem (Viswanathan, 1990).