TCLR: Temporal contrastive learning for video representation
作者:
Highlights:
• TCLR is a contrastive learning framework for video understanding tasks.
• Explicitly enforces within instance temporal feature variation without pretext tasks.
• Proposes novel local–local and global–local temporal contrastive losses.
• Significantly outperforms state-of-art pre-training on video understanding tasks.
• Uses fine-grained action classification task for evaluating learned representations.
摘要
•TCLR is a contrastive learning framework for video understanding tasks.•Explicitly enforces within instance temporal feature variation without pretext tasks.•Proposes novel local–local and global–local temporal contrastive losses.•Significantly outperforms state-of-art pre-training on video understanding tasks.•Uses fine-grained action classification task for evaluating learned representations.
论文关键词:Self-Supervised Learning,Action Recognition,Video Representation
论文评审过程:Received 10 August 2021, Revised 7 January 2022, Accepted 5 March 2022, Available online 16 March 2022, Version of Record 5 April 2022.
论文官网地址:https://doi.org/10.1016/j.cviu.2022.103406