Action Recognition Using Multiple Pooling Strategies of CNN Features

作者：Haifeng Hu, Zhongke Liao, Xiang Xiao

摘要

The deep convolution neural network has shown great potential in the field of human action recognition. For the sake of obtaining compact and discriminative feature representation, this paper proposes multiple pooling strategies using CNN features. We explore three different pooling strategies, which are called space-time feature pooling (STFP), time filter pooling (TFP) and spatio-temporal pyramid pooling (STPP), respectively. STFP shares the advantages of both hand-crafted features and deep ConvNets features. TFP reflects the change of elements on each CNN feature map over time. STPP focuses on the spatial and temporal pyramid structure of the feature maps. We aggregate these pooled features to produce a new discriminative video descriptor. Experimental results show that the three strategies have complementary advantages on the challenging YouTube, UCF50 and UCF101 datasets, and our video representation is comparable to the previous state-of-the-art algorithms.

论文关键词：Action recognition, Convolutional neural networks, Multiple pooling strategies

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11063-018-9932-3