Machine-printed Japanese document recognition
作者:
Highlights:
•
摘要
Cherry Blossom is a general-purpose Japanese document recognition system developed at CEDAR. The input to the system can be facsimile pages or images scanned at low resolution. Given a Japanese document image, the system deskews the image, extracts text regions, segments text regions into text lines and further into characters, and recognizes character images as characters in JIS code. Two feature sets, the Local Stroke Direction feature and the Gradient, Structural, and Concavity feature, are used for character classification. Two classification methods, the nearest neighbor classifier and the minimum error subspace method, have been designed and they have been integrated to achieve better performance. We also describe the new Japanese character image database developed at CEDAR. This database consists of approximately 180,000 labeled character images from more than 3300 categories, extracted from diverse document images. Results of our system on this dataset are also presented.
论文关键词:Machine-printed document recognition,Japanese OCR,Japanese character image database
论文评审过程:Received 10 July 1996, Revised 21 October 1996, Available online 7 June 2001.
论文官网地址:https://doi.org/10.1016/S0031-3203(96)00168-9