Marginal noise removal of document images

作者:

Highlights:

摘要

Marginal noise is a common phenomenon in document analysis which results from the scanning of thick documents or skew documents. It usually appears in the front of a large and dark region around the margin of document images. Marginal noise might cover meaningful document objects, such as text, graphics and forms. The overlapping of marginal noise with meaningful objects makes it difficult to perform the task of segmentation and recognition of document objects. This paper proposes a novel approach to remove marginal noise. The proposed approach consists of two steps which are marginal noise detection and marginal noise deletion. Marginal noise detection will reduce an original document image into a smaller image, and then find marginal noise regions according to the shape length and location of the split blocks. After the detection of marginal noise regions, different removal methods are performed. A local thresholding method is proposed for the removal of marginal noise in gray-scale document images, whereas a region growing method is devised for binary document images. Experimenting with a wide variety of test samples reveals the feasibility and effectiveness of our proposed approach in removing marginal noises.

论文关键词:Document image analysis,Marginal noise removal,Adaptive thresholding,Image segmentation,Preprocessing

论文评审过程:Received 12 April 2001, Accepted 12 October 2001, Available online 21 December 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(01)00205-9