Automatic recognition of printed Farsi texts

作者:

Highlights:

摘要

The automatic recognition of printed Farsi (Persian) texts is complicated by several properties of the Farsi script: (a) connectivity of symbols, (b) similarity of groups of symbols, (c) highly variable widths, (d) subword overlap, and (e) line overlap. In this paper, a technique for the automatic recognition of printed Farsi texts is presented and its steps are discussed as follows: (1) digitization, (2) editing, (3) line separation, (4) subword separation, (5) symbol separation, (6) recognition, and (7) postprocessing. The most notable contributions of this work are in algorithms for steps (5) and (6) above. Practical application of the technique to Farsi newspaper headlines has been 100% successful. However, smaller type fonts, which could not be handled by the coarse digitization hardware used, will no doubt result in less than perfect recognition. The technique is also applicable with little or no modification to printed Arabic and Urdu texts which use the same alphabet as Farsi.

论文关键词:Character recognition,Computer input,Document input,Farsi,Feature selection,Optical character recognition,Pattern recognition,Persian,Printed text recognition

论文评审过程:Received 9 January 1980, Accepted 22 December 1980, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(81)90084-4