Text region extraction in a document image based on the Delaunay tessellation
作者:
Highlights:
•
摘要
In this paper, Delaunay triangulation is applied for the extraction of text areas in a document image. By representing the location of connected components in a document image with their centroids, the page structure is described as a set of points in two-dimensional space. When imposing Delaunay triangulation on these points, the text regions in the Delaunay triangulation will have distinguishing triangular features from image and drawing regions. For analysis, the Delaunay triangles are divided into four classes. The study reveals that specific triangles in text areas can be clustered together and identified as text body. Using this method, text regions in a document image containing fragments can also be recognized accurately. Experiments show the method is also very efficient.
论文关键词:Delaunay triangulation,Page segmentation,Document image analysis
论文评审过程:Received 23 May 2001, Accepted 1 April 2002, Available online 4 June 2002.
论文官网地址:https://doi.org/10.1016/S0031-3203(02)00082-1