Learning to combine the modalities of language and video for temporal moment localization

作者:

Highlights:

• CM-LSTM fuses visual and query features and encodes the contextual features.

• CM-LSTM brings improvements when applied to TML methods using the original LSTM.

• TACI outperforms state-of-the-art TML methods on the ActivityNet-Captions.

摘要

•CM-LSTM fuses visual and query features and encodes the contextual features.•CM-LSTM brings improvements when applied to TML methods using the original LSTM.•TACI outperforms state-of-the-art TML methods on the ActivityNet-Captions.

论文关键词:Temporal moment localization,Temporal video localization,Temporal video grounding,Cross-modal integration,Boundary alignment

论文评审过程:Received 2 June 2021, Revised 13 January 2022, Accepted 17 January 2022, Available online 31 January 2022, Version of Record 7 February 2022.

论文官网地址:https://doi.org/10.1016/j.cviu.2022.103375