KOHTD: Kazakh offline handwritten text dataset

作者:

Highlights:

• Introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary.

• Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter.

• Implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.

• Solve a handwritten Kazakh interpretation task using well-known RNN models, such as Flor, Abdallah, Bluche, and Puigcerver HTR models.

摘要

•Introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary.•Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter.•Implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.•Solve a handwritten Kazakh interpretation task using well-known RNN models, such as Flor, Abdallah, Bluche, and Puigcerver HTR models.

论文关键词:Document analysis and recognition,Handwritten Kazakh text recognition in Cyrillic,Benchmark dataset,Convolutional neural networks,Genetic algorithm,Deep learning

论文评审过程:Received 5 October 2021, Revised 21 June 2022, Accepted 12 July 2022, Available online 16 July 2022, Version of Record 27 July 2022.

论文官网地址:https://doi.org/10.1016/j.image.2022.116827