Ancient text recognition: a review

作者：Sonika Rani Narang, M. K. Jindal, Munish Kumar

摘要

Optical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, there is a wealth of knowledge in the form of ancient text documents. It is a challenge to maintain and search from this paper-based data. At many places, efforts are being done to digitize this data. Paper based documents are scanned to digitize data but scanned data is in pictorial form. It cannot be recognized by computers because computers can understand standard alphanumeric characters as ASCII or some other codes. Therefore, alphanumeric information must be retrieved from scanned images. Optical character recognition system allows us to convert a document into electronic text, which can be used for edit, search, etc. operations. OCR system is the machine replication of human reading and has been the subject of intensive research for more than six decades. This paper presents a comprehensive survey of the work done in the various phases of an OCR with special focus on the OCR for ancient text documents. This paper will help the novice researchers by providing a comprehensive study of the various phases, namely, segmentation, feature extraction and classification techniques required for an OCR system especially for ancient documents. It has been observed that there is a limited work is done for the recognition of ancient documents especially for Devanagari script. This article also presents future directions for the upcoming researchers in the field of ancient text recognition.

论文关键词：OCR, Feature extraction, Classification, Devanagari, Ancient

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10462-020-09827-4