Reading order of Chinese newspaper articles using a block-growing method

作者:

Highlights:

摘要

The reading order of articles plays an important role in the areas of document analysis and document understanding. Since the reading sequence conveys significant semantic information embedded in a document, it will influence the robustness and correctness of post-applications in document analysis. Hence, the reading order of documents, which most researchers and systems frequently ignore, is indispensable for post-processing in order to understand document systems. In this paper, a block-growing approach is proposed to generate the reading order of Chinese newspapers, by matching predefined style graphs. In this graph-matching approach, the geometric relation graph (GRG) is first constructed according to the geometric relationships among the segmented blocks of input documents. Those blocks that belong to the same article are gradually merged to form an article by matching the predefined style graphs. Since the local reading information is retained during the block-growing process, the global reading information can be easily generated. The proposed bottomup merging approach is powerful and flexible in finding the reading order of Chinese newspaper articles. A wide variety of Chinese newspapers with horizontal and vertical styles were tested to verify the validity of our proposed method. Experimental results reveal the feasibility and effectiveness of our proposed approach.

论文关键词:Document analysis,Reading order,Block growing,Style graph matching

论文评审过程:Received 3 March 1997, Revised 10 October 1997, Accepted 14 October 1997, Available online 21 August 1998.

论文官网地址:https://doi.org/10.1016/S0262-8856(97)00077-2