Deep ensemble network using distance maps and body part features for skeleton based action recognition

作者：

Highlights：

•

摘要

Human action recognition is a hot research topic in the field of computer vision. The availability of low cost depth sensors in the market made the extraction of reliable skeleton maps of human objects easier. This paper proposes three subnets, referred to as SNet, TNet, and BodyNet to capture diverse spatio-temporal dynamics for action recognition task. Specifically, SNet is used to capture pose dynamics from the distance maps in the spatial domain. The second subnet (TNet) captures the temporal dynamics along the sequence. The third net (BodyNet) extracts distinct features from the fine-grained body parts in the temporal domain. With the motivation of ensemble learning, a hybrid network, referred to as HNet, is modeled using two subnets (TNet and BodyNet) to capture robust temporal dynamics. Finally, SNet and HNet are fused as one ensemble network for action classification task. Our method achieves competitive results on three widely used datasets: UTD MHAD, UT Kinect and NTU RGB+D.

论文关键词：Human action recognition,Distance maps,Part features,Convolutional neural networks,Long short term memory

论文评审过程：Received 31 October 2018, Revised 26 October 2019, Accepted 21 November 2019, Available online 22 November 2019, Version of Record 28 November 2019.

论文官网地址：https://doi.org/10.1016/j.patcog.2019.107125