Video sketch: A middle-level representation for action recognition

作者：Xing-Yuan Zhang, Ya-Ping Huang, Yang Mi, Yan-Ting Pei, Qi Zou, Song Wang

摘要

Different modalities extracted from videos, such as RGB and optical flows, may provide complementary cues for improving video action recognition. In this paper, we introduce a new modality named video sketch, which implies the human shape information, as a complementary modality for video action representation. We show that video action recognition can be enhanced by using the proposed video sketch. More specifically, we first generate video sketch with class distinctive action areas and then employ a two-stream network to combine the shape information extracted from image-based sketch and point-based sketch, followed by fusing the classification scores of two streams to generate shape representation for videos. Finally, we use the shape representation as the complementary one for the traditional appearance (RGB) and motion (optical flow) representations for the final video classification. We conduct extensive experiments on four human action recognition datasets – KTH, HMDB51, UCF101, Something-Something and UTI. The experimental results show that the proposed method outperforms the existing state-of-the-art action recognition methods.

论文关键词：Action recognition, video sketch, attention model

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-020-01905-y