VSRNet: End-to-end video segment retrieval with text query

作者：

Highlights：

• We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.

• We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.

• Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.

摘要

•We propose a novel framework that combines both video retrieval and segment localization into one network, and the joint training improves the performance of each task.•We introduce a text-aligned attention mechanism to efficiently generate temporal proposal and a collaborative ranking strategy to improve the performance of video segment retrieval.•Extensive experiments conducted on DiDeMo and ActivityNet Captions demonstrate the superiority of our method in VSR task.

论文关键词：Video segment retrieval,Video retrieval,Description localization

论文评审过程：Received 9 July 2020, Revised 14 March 2021, Accepted 5 May 2021, Available online 23 May 2021, Version of Record 7 June 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.108027