Segmentation and recognition of Arabic characters by structural classification

作者:

Highlights:

摘要

Arabic characters differ significantly from other characters, such as Latin and Chinese characters, in that they are written cursively in both printed and handwritten forms, and consist of 28 main characters. However, most of their shapes change according to their position in the word. These shapes, together with some other secondaries, raise the number of classes to 120. Furthermore, some of these characters have the same shape but are distinguished by the presence of one, two or three dots above or below them. In this paper, words are first segmented into characters and secondaries are removed using newly developed algorithms. This reduced the number of classes to 32. Information about these secondaries, such as their number, position and type, is recorded and used in the final recognition stage. Features of the skeletonized character are used for classification using a decision tree. A recognition rate of 97.23% over a set of 4260 samples is achieved.

论文关键词:Arabic character recognition,OCR,Segmentation,Feature extraction

论文评审过程:Received 25 July 1995, Revised 21 May 1996, Accepted 23 May 1996, Available online 19 May 1998.

论文官网地址:https://doi.org/10.1016/S0262-8856(96)01119-5