End-to-end subtitle detection and recognition for videos in East Asian languages via CNN ensemble

作者:

Highlights:

• An end-to-end subtitle detection and recognition system for East Asian languages is proposed and near-human-level recognition performance is achieved.

• A novel image operator with the sequence information throughout the video is proposed to detect subtitle top/bottom boundary and single character width.

• A CNN ensemble is leveraged to perform the classification of East Asian characters across huge dictionaries. The visualization of CNNs proves that different CNN models can capture distinctive features of characters.

摘要

•An end-to-end subtitle detection and recognition system for East Asian languages is proposed and near-human-level recognition performance is achieved.•A novel image operator with the sequence information throughout the video is proposed to detect subtitle top/bottom boundary and single character width.•A CNN ensemble is leveraged to perform the classification of East Asian characters across huge dictionaries. The visualization of CNNs proves that different CNN models can capture distinctive features of characters.

论文关键词:Subtitle text detection,Subtitle text recognition,Synthetic training data,Convolutional neural networks,Video sequence information,East Asian language

论文评审过程:Received 8 April 2017, Revised 30 September 2017, Accepted 30 September 2017, Available online 16 October 2017, Version of Record 24 October 2017.

论文官网地址:https://doi.org/10.1016/j.image.2017.09.013