End-to-end video text detection with online tracking

作者：

Highlights：

• To the best of our knowledge, this is the first end-to-end video text detection and online tracking framework.

• Our end-to-end model is clear and interpretable. The descriptor we present contains multiple interpretable features, appearance, geometry and pyramidal histogram of characters.

• We suitably introduce the structures of long-shot term memory mechanism to our framework, which is very useful for capturing spatial-temporal information.

• Extensive experiments have shown the effectiveness of our method, and we have obtained state-of-the-art results in multiple public benchmarks.

摘要

•To the best of our knowledge, this is the first end-to-end video text detection and online tracking framework.•Our end-to-end model is clear and interpretable. The descriptor we present contains multiple interpretable features, appearance, geometry and pyramidal histogram of characters.•We suitably introduce the structures of long-shot term memory mechanism to our framework, which is very useful for capturing spatial-temporal information.•Extensive experiments have shown the effectiveness of our method, and we have obtained state-of-the-art results in multiple public benchmarks.

论文关键词：End-to-end,Video text detection,Online tracking

论文评审过程：Received 15 July 2019, Revised 2 August 2020, Accepted 14 December 2020, Available online 7 January 2021, Version of Record 20 January 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2020.107791