VSRNet: End-to-end video segment retrieval with text query
作者:
Highlights:
• We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.
• We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.
• Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.
摘要
•We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.•We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.•Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.
论文关键词:Video segment retrieval,Video retrieval,Description localization
论文评审过程:Received 9 July 2020, Revised 14 March 2021, Accepted 5 May 2021, Available online 23 May 2021, Version of Record 7 June 2021.
论文官网地址:https://doi.org/10.1016/j.patcog.2021.108027