RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning

作者:

Highlights:

• This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in two of the most popular Indian scripts—Devanagari and Bengali, based on two recently developed versions of Recurrent Neural Network (RNN), named as Long–Short Term Memory (LSTM) and Bidirectional Long–Short Term Memory (BLSTM).

• The proposed approach divides each word horizontally into three zones—upper, middle, and lower, before carrying out training of basic strokes using LSTM and BLSTM versions of RNN. This type of zone division is done to reduce the variations in temporal orders of basic strokes within a word.

• The major strength of the proposed system is unlike most of the existing wordrecognition systems in these two scripts, it can recognize those words also which are not present in the trainingdataset as it considers basic stroke based class labelling scheme to train the classifier. The proposed system also overcomes various drawbacks of HMM that are common in existing HMM based word recognition systems.

• The experiments have been carried out in HMM based platform also to show the comparative performance analysis of the present system in both HMM and RNN based platforms.

• Experimental results show that the proposed zone segmentation technique and adopting LSTM–BLSTM based learning outperform existing word recognition systems including HMM based ones in these two Indian scripts.

摘要

•This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in two of the most popular Indian scripts—Devanagari and Bengali, based on two recently developed versions of Recurrent Neural Network (RNN), named as Long–Short Term Memory (LSTM) and Bidirectional Long–Short Term Memory (BLSTM).•The proposed approach divides each word horizontally into three zones—upper, middle, and lower, before carrying out training of basic strokes using LSTM and BLSTM versions of RNN. This type of zone division is done to reduce the variations in temporal orders of basic strokes within a word.•The major strength of the proposed system is unlike most of the existing wordrecognition systems in these two scripts, it can recognize those words also which are not present in the trainingdataset as it considers basic stroke based class labelling scheme to train the classifier. The proposed system also overcomes various drawbacks of HMM that are common in existing HMM based word recognition systems.•The experiments have been carried out in HMM based platform also to show the comparative performance analysis of the present system in both HMM and RNN based platforms.•Experimental results show that the proposed zone segmentation technique and adopting LSTM–BLSTM based learning outperform existing word recognition systems including HMM based ones in these two Indian scripts.

论文关键词:Online handwriting,Word recognition,Indian scripts,Horizontal zone division,RNN,LSTM,BLSTM

论文评审过程:Received 22 March 2018, Revised 17 March 2019, Accepted 30 March 2019, Available online 1 April 2019, Version of Record 5 April 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.03.030