VSRNet: End-to-end video segment retrieval with text query

作者:

Highlights:

• We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.

• We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.

• Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.

摘要

•We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.•We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.•Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.

论文关键词:Video segment retrieval,Video retrieval,Description localization

论文评审过程:Received 9 July 2020, Revised 14 March 2021, Accepted 5 May 2021, Available online 23 May 2021, Version of Record 7 June 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108027