KOHTD: Kazakh offline handwritten text dataset
作者:
Highlights:
• Introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary.
• Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter.
• Implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.
• Solve a handwritten Kazakh interpretation task using well-known RNN models, such as Flor, Abdallah, Bluche, and Puigcerver HTR models.
摘要
•Introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary.•Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter.•Implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.•Solve a handwritten Kazakh interpretation task using well-known RNN models, such as Flor, Abdallah, Bluche, and Puigcerver HTR models.
论文关键词:Document analysis and recognition,Handwritten Kazakh text recognition in Cyrillic,Benchmark dataset,Convolutional neural networks,Genetic algorithm,Deep learning
论文评审过程:Received 5 October 2021, Revised 21 June 2022, Accepted 12 July 2022, Available online 16 July 2022, Version of Record 27 July 2022.
论文官网地址:https://doi.org/10.1016/j.image.2022.116827