Multimodal interactive transcription of text images

作者：

Highlights：

•

摘要

To date, automatic handwriting recognition systems are far from being perfect and heavy human intervention is often required to check and correct the results of such systems. This “post-editing” process is both inefficient and uncomfortable to the user. An example is the transcription of historic documents: state-of-the-art handwritten text recognition technology is not suitable to perform this task automatically and expensive paleography expert work is needed to achieve correct transcriptions. As an alternative to fully manual transcription and post-editing, a multimodal interactive approach is proposed here where user feedback is provided by means of touchscreen pen strokes and/or more traditional keyboard and mouse operation. User's feedback directly allows to improve system accuracy, while multimodality increases system ergonomy and user acceptability. Multimodal interaction is approached in such a way that both the main and the feedback data streams help each-other to optimize overall performance and usability. Empirical tests on three cursive handwritten tasks suggest that, using this approach, considerable amounts of user effort can be saved with respect to both pure manual work and non-interactive, post-editing processing.

论文关键词：Multimodal interactive pattern recognition,Computer assisted transcription,Handwritten text recognition

论文评审过程：Received 30 January 2009, Revised 28 October 2009, Accepted 19 November 2009, Available online 27 November 2009.

论文官网地址：https://doi.org/10.1016/j.patcog.2009.11.019