Action Transformer: A self-attention model for short-time pose-based human action recognition

作者：

Highlights：

• We study the application of the Transformer encoder to 2D pose-based HAR and propose the novel AcT model.

• We introduce MPOSE2021, a dataset for real-time short-time HAR. In contrast to other publicly available datasets, the peculiarity of having a constrained number of time steps stimulates the development of actual real-time methodologies that perform HAR with low latency and high throughput.

• We conduct extensive experimentation on model performance and latency to verify the suitability of AcT for real-time applications.

摘要

•We study the application of the Transformer encoder to 2D pose-based HAR and propose the novel AcT model.•We introduce MPOSE2021, a dataset for real-time short-time HAR. In contrast to other publicly available datasets, the peculiarity of having a constrained number of time steps stimulates the development of actual real-time methodologies that perform HAR with low latency and high throughput.•We conduct extensive experimentation on model performance and latency to verify the suitability of AcT for real-time applications.

论文关键词：Human action recognition,Deep learning,Computer vision,Transformer

论文评审过程：Received 2 August 2021, Revised 26 November 2021, Accepted 4 December 2021, Available online 15 December 2021, Version of Record 20 December 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.108487