TMF: Temporal Motion and Fusion for action recognition

作者:

Highlights:

摘要

Temporal motion information plays an important role in video understanding, human action recognition and other fields. Optical flow, which contains rich temporal motion information, has been widely used in many visual tasks and has achieved superior performance. However, the extraction of optical flow is time-consuming and laborious. In this paper, we propose a Temporal Motion and Fusion (TMF) module, including a motion extraction (ME) module and a temporal crossing fusion (TCF) module. The ME module can replace the traditional optical flow, establish the matching relationship between adjacent frames on the convoluted feature maps. And then extract simple and effective short-term motion information. TCF module crosses adjacent frames and fuse the information of nonadjacent video frames to realize long-term motion information modeling. Finally, the extracted motion information is fused with the appearance information captured by 2D convolution for final recognition. The experiment proved that with only a few additional parameters and calculation costs increased, our proposed lightweight model achieves state-of-the-art results on Something-Something-V1&V2 and Diving-48, and obtains competitive results on HMDB-51 and UCF-101 among the single models.

论文关键词:

论文评审过程:Received 10 March 2021, Revised 30 July 2021, Accepted 6 October 2021, Available online 12 October 2021, Version of Record 22 October 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103304