A hierarchical approach to recognition of handwritten Bangla characters

作者:

Highlights:

摘要

A novel hierarchical approach is presented here for optical character recognition (OCR) of handwritten Bangla words. Instead of dealing with isolated characters as found in selected works [T.K. Bhowmik, U. Bhattacharya, S.K. Parui, Recognition of Bangla handwritten characters using an MLP classifier based on stroke features, in: Proceedings of the ICONIP, Kolkata, India, 2004, pp. 814–819; K. Roy, U. Pal, F. Kimura, Bangla handwritten character recognition, in: Proceedings of the Second Indian International Conference on Artificial Intelligence (IICAI), 2005, pp. 431–443; S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D.K. Basu, Handwritten Bangla alphabet recognition using an MLP based classifier, in: Proceedings of the Second National Conference on Computer Processing of Bangla, Dhaka, 2005, pp. 285–291; A.F.R. Rahman, R. Rahman, M.C. Fairhurst, Recognition of handwritten Bengali characters: a novel multistage approach, Pattern Recognition 35, 2002, pp. 997–1006; U. Bhattacharya, S.K. Parui, M. Sridhar, F. Kimura, Two-stage recognition of handwritten Bangla alphanumeric characters using neural classifiers, in: Proceedings of the Second Indian International Conference on Artificial Intelligence (IICAI), 2005, pp. 1357–1376; U. Bhattacharya, M. Sridhar, S.K. Parui, On recognition of handwritten Bangla characters, in: Proceedings of the ICVGIP-06, Lecture Notes in Computer Science, vol. 4338, 2006, pp. 817–828], the present approach segments a word image on Matra hierarchy, then recognizes the individual word segments and finally identifies the constituent characters of the word image through intelligent combination of recognition decisions of the associated word segments. Due to possible appearances of consecutive characters of Bangla words on overlapping character positions, segmentation of Bangla word images is not easy. For successful OCR of handwritten Bangla text, not only recognition but also segmentation of word images are important. In this respect the present hierarchical approach deals with both segmentation and recognition of handwritten Bangla word images for a complete solution to handwritten word recognition problem, an essential area of OCR of handwritten Bangla text. In dealing with certain category of word segments, created on Matra hierarchy, a sophisticated recognition technique, viz., two-pass approach [S. Basu, C. Chaudhury, M. Kundu, M. Nasipuri, D.K. Basu, A two pass approach to pattern classification, in: N.R. Pal et al. (Ed.), Lecture Notes in Computer Science, vol. 3316, ICONIP, Kolkata, 2004, pp. 781–786] is employed here. The degree of sophistication of the classification technique is also rationally tuned depending on various categories of word segments to be recognized. For example, the two-pass approach is employed here for recognizing middle zone character segments, whereas recognition of middle zone modified shapes of Bangla script is done through simple template matching. Considering learning and generalization abilities of multi layer perceptrons (MLPs), MLP based pattern classifiers are used here for most of the classification related tasks. A powerful feature set is also designed under this work for recognition of complex character patterns using three types of topological features, viz., longest-run features, modified shadow features and octant-centroid features. In a nutshell, the work deals with a practical problem of OCR of Bangla text involving recognition as well as segmentation of constituent characters of handwritten Bangla words.

论文关键词:Handwritten Bangla character recognition,Hierarchical classification,Two-pass approach to pattern classification,Multi layer perceptron

论文评审过程:Received 27 August 2007, Revised 25 December 2008, Accepted 7 January 2009, Available online 17 January 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.01.008