Multi-oriented touching text character segmentation in graphical documents using dynamic programming
作者:
Highlights:
•
摘要
The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes.
论文关键词:Touching character segmentation,Multi-oriented character recognition,Dynamic programming
论文评审过程:Received 22 December 2010, Revised 15 June 2011, Accepted 19 September 2011, Available online 25 November 2011.
论文官网地址:https://doi.org/10.1016/j.patcog.2011.09.026