Weakly supervised action anticipation without object annotations

作者：Yi Zhong, Jia-Hui Pan, Haoxin Li, Wei-Shi Zheng

摘要

Anticipating future actions without observing any partial videos of future actions plays an important role in action prediction and is also a challenging task. To obtain abundant information for action anticipation, some methods integrate multimodal contexts, including scene object labels. However, extensively labelling each frame in video datasets requires considerable effort. In this paper, we develop a weakly supervised method that integrates global motion and local finegrained features from current action videos to predict next action label without the need for specific scene context labels. Specifically, we extract diverse types of local features with weakly supervised learning, including object appearance and human pose representations without ground truth. Moreover, we construct a graph convolutional network for exploiting the inherent relationships of humans and objects under present incidents. We evaluate the proposed model on two datasets, the MPII-Cooking dataset and the EPIC-Kitchens dataset, and we demonstrate the generalizability and effectiveness of our approach for action anticipation.

论文关键词：action anticipation, weakly supervised learning, relation modelling, graph convolutional network

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11704-022-1167-9