Dual relation network for temporal action localization

作者：

Highlights：

•

摘要

Temporal action localization is a challenging task for video understanding. Most previous methods process each proposal independently and neglect the reasoning of proposal-proposal and proposal-context relations. We argue that the supplementary information obtained by exploiting these relations can enhance the proposal representation and further boost the action localization. To this end, we propose a dual relation network to model both proposal-proposal and proposal-context relations. Concretely, a proposal-proposal relation module is leveraged to learn discriminative supplementary information from relevant proposals, which allows the network to model their interaction based on appearance and geometric similarities. Meanwhile, a proposal-context relation module is employed to mine contextual clues by adaptively learning from the global context outside of region-based proposals. They effectively leverage the inherent correlation between actions and the long-term dependency with videos for high-quality proposal refinement. As a result, the proposed framework enables the model to distinguish similar action instances and locate temporal boundaries more precisely. Extensive experiments on the THUMOS14 dataset and ActivityNet v1.3 dataset demonstrate that the proposed method significantly outperforms recent state-of-the-art methods.

论文关键词：Temporal action localization,Relation reasoning

论文评审过程：Received 30 November 2021, Revised 2 April 2022, Accepted 19 April 2022, Available online 22 April 2022, Version of Record 28 April 2022.

论文官网地址：https://doi.org/10.1016/j.patcog.2022.108725