Action recognition based on joint trajectory maps with convolutional neural networks

作者:

Highlights:

摘要

Convolutional Neural Networks (ConvNets) have recently shown promising performance in many computer vision tasks, especially image-based recognition. How to effectively apply ConvNets to sequence-based data is still an open problem. This paper proposes an effective yet simple method to represent spatio-temporal information carried in 3D skeleton sequences into three 2D images by encoding the joint trajectories and their dynamics into color distribution in the images, referred to as Joint Trajectory Maps (JTM), and adopts ConvNets to learn the discriminative features for human action recognition. Such an image-based representation enables us to fine-tune existing ConvNets models for the classification of skeleton sequences without training the networks afresh. The three JTMs are generated in three orthogonal planes and provide complimentary information to each other. The final recognition is further improved through multiplicative score fusion of the three JTMs. The proposed method was evaluated on four public benchmark datasets, the large NTU RGB+D Dataset, MSRC-12 Kinect Gesture Dataset (MSRC-12), G3D Dataset and UTD Multimodal Human Action Dataset (UTD-MHAD) and achieved the state-of-the-art results.

论文关键词:Action recognition,Trajectory,Color encoding,Convolutional neural network

论文评审过程:Received 2 October 2017, Revised 19 May 2018, Accepted 21 May 2018, Available online 15 June 2018, Version of Record 6 July 2018.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.05.029