End-to-end video text detection with online tracking
作者:
Highlights:
• To the best of our knowledge, this is the first end-to-end video text detection and online tracking framework.
• Our end-to-end model is clear and interpretable. The descriptor we present contains multiple interpretable features, appearance, geometry and pyramidal histogram of characters.
• We suitably introduce the structures of long-shot term memory mechanism to our framework, which is very useful for capturing spatial-temporal information.
• Extensive experiments have shown the effectiveness of our method, and we have obtained state-of-the-art results in multiple public benchmarks.
摘要
•To the best of our knowledge, this is the first end-to-end video text detection and online tracking framework.•Our end-to-end model is clear and interpretable. The descriptor we present contains multiple interpretable features, appearance, geometry and pyramidal histogram of characters.•We suitably introduce the structures of long-shot term memory mechanism to our framework, which is very useful for capturing spatial-temporal information.•Extensive experiments have shown the effectiveness of our method, and we have obtained state-of-the-art results in multiple public benchmarks.
论文关键词:End-to-end,Video text detection,Online tracking
论文评审过程:Received 15 July 2019, Revised 2 August 2020, Accepted 14 December 2020, Available online 7 January 2021, Version of Record 20 January 2021.
论文官网地址:https://doi.org/10.1016/j.patcog.2020.107791