An asymmetrical-structure auto-encoder for unsupervised representation learning of skeleton sequences

作者:

Highlights:

摘要

In this paper, we propose a novel framework for unsupervised representation learning using a structure-asymmetrical auto-encoder in which a 2D-CNN-based encoder learns separable spatiotemporal representations in a low-dimensional feature space under the supervision of salient skeleton motion cues. This study addresses the problem of learning action representations of skeleton sequences. The network captures not only correlations of adjacent joints but also long-term motion dependencies by using the proposed unsupervised training, which leads to the advantage that similar movements are gathered around the same cluster, whereas different movements are gathered around distinct clusters. Our method is unsupervised and does not rely on annotations to associate skeleton sequences with actions. Experimental results clearly showed the effectiveness of the proposed representation learning, and improvements compared with skeleton-based generative learning methods. When the proposed network was fine-tuned with partial labeled data, our results still outperformed some fully supervised methods.

论文关键词:

论文评审过程:Received 14 September 2021, Revised 13 June 2022, Accepted 15 June 2022, Available online 20 June 2022, Version of Record 24 June 2022.

论文官网地址:https://doi.org/10.1016/j.cviu.2022.103491