Newspaper layout analysis incorporating connected component separation

作者:

Highlights:

摘要

This paper presents an algorithm that performs automated segmentation and classification of newspaper images. A notable feature of the algorithm is a technique for segmenting components that are connected to other components. In particular, horizontal lines and vertical lines, which can be vital in determining the page layout, can be segmented from other lines and other components. The algorithm uses a bottom-up approach to initially segment the image, classify patterns and extract text lines. The classified patterns are then merged into complete regions. The algorithm is tested on a set of complex English and Greek newspaper images dating back to 1900.

论文关键词:Document analysis,Image segmentation,Newspaper segmentation

论文评审过程:Received 9 April 2003, Revised 6 November 2003, Accepted 13 November 2003, Available online 29 December 2003.

论文官网地址:https://doi.org/10.1016/j.imavis.2003.11.001