Character string extraction from color documents

作者:

Highlights:

摘要

A new algorithm for the extraction of character strings from color documents is proposed. We first divide a full color image into several representative binary color images. Then, character strings are nominated from each binary image by using multi-stage relaxation. However, the nominated strings are not always characters. They may be a part of the background, concatenated holes of characters, or dotted lines, etc. Therefore, when all nominated strings of all binary images are superimposed, some strings overlap each other. So, we selected the appropriate strings from them using the likelihood of a character string and two kinds of conflict resolution. In the experiments, we used color images like magazine covers, posters, etc. After applying color segmentation and the multi-stage relaxation, many character strings were nominated. Next, some adequate strings were selected. Finally, we show the experimental results and discuss some problems of extracting character strings from a color document.

论文关键词:Color document,Character string extraction,Color segmentation,Multi-stage relaxation,Conflict resolution,Likelihood of a character string

论文评审过程:Received 27 July 1999, Revised 5 June 2000, Accepted 5 June 2000, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(00)00081-9