Page segmentation using texture analysis

作者:

Highlights:

摘要

We propose a new texture-based language-free page segmentation algorithm which automatically extracts the text, halftone, and line-drawing regions from input greyscale document images. This approach utilizes a neural network to train a set of masks which is optimal for discriminating the three main texture classes in the page segmentation problem: halftone, background, and text and line-drawing regions. The text and line-drawing regions are further discriminated based on connectivity analysis. We have applied the algorithm to successfully segment English and Chinese document images. We also demonstrate that the masks can perform language separation (English/Chinese) when appropriately trained.

论文关键词:Document analysis,Neural network,Page segmentation,Texture,Learning

论文评审过程:Received 11 April 1995, Revised 7 August 1995, Accepted 29 August 1995, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/0031-3203(95)00131-X