Self-supervised video representation learning by maximizing mutual information
作者:
Highlights:
• We propose a novel self-supervised task DVIM for the representation learning from the unlabeled videos.
• We design a neural network architecture for the DVIM, including the feature extractor and the network for maximizing the mutual information.
• The experimental results demonstrate that the DVIM can serve as an effective pre-training method for the task of action recognition in videos.
• Experiments of action similarity labeling demonstrate that the representations learned by the DVIM can be transferred to other visual tasks.
摘要
•We propose a novel self-supervised task DVIM for the representation learning from the unlabeled videos.•We design a neural network architecture for the DVIM, including the feature extractor and the network for maximizing the mutual information.•The experimental results demonstrate that the DVIM can serve as an effective pre-training method for the task of action recognition in videos.•Experiments of action similarity labeling demonstrate that the representations learned by the DVIM can be transferred to other visual tasks.
论文关键词:Self-supervised learning,Deep learning,Video representation,Mutual information,Action recognition
论文评审过程:Received 7 November 2019, Revised 14 June 2020, Accepted 2 August 2020, Available online 12 August 2020, Version of Record 17 August 2020.
论文官网地址:https://doi.org/10.1016/j.image.2020.115967