Learning to combine the modalities of language and video for temporal moment localization
作者:
Highlights:
• CM-LSTM fuses visual and query features and encodes the contextual features.
• CM-LSTM brings improvements when applied to TML methods using the original LSTM.
• TACI outperforms state-of-the-art TML methods on the ActivityNet-Captions.
摘要
•CM-LSTM fuses visual and query features and encodes the contextual features.•CM-LSTM brings improvements when applied to TML methods using the original LSTM.•TACI outperforms state-of-the-art TML methods on the ActivityNet-Captions.
论文关键词:Temporal moment localization,Temporal video localization,Temporal video grounding,Cross-modal integration,Boundary alignment
论文评审过程:Received 2 June 2021, Revised 13 January 2022, Accepted 17 January 2022, Available online 31 January 2022, Version of Record 7 February 2022.
论文官网地址:https://doi.org/10.1016/j.cviu.2022.103375