2D progressive fusion module for action recognition

作者:

Highlights:

摘要

Network convergence as well as recognition accuracy are essential issues when applying Convolutional Neural Networks (CNN) to human action recognition. Most deep learning methods neglect model convergence when striving to improve the abstraction capability, thus degrading the performances sharply when computing resources are limited. To mitigate this problem, we propose a structure named 2D Progressive Fusion (2DPF) Module which is inserted after the 2D backbone CNN layers. 2DPF fuses features through a novel 2D convolution on the spatial and temporal dimensions called variation attenuating convolution and applies fusion techniques to improve the recognition accuracy and the convergency. Our experiments performed on several benchmarks (e.g., Something-Something V1&V2, Kinetics400 & 600, AViD, UCF101) demonstrate the effectiveness of the proposed method.ARTICLE INFO.

论文关键词:Convergence,spatiotemporal modeling,2D CNN,action recognition

论文评审过程:Received 26 November 2020, Revised 23 January 2021, Accepted 28 January 2021, Available online 18 February 2021, Version of Record 21 March 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104122