Weakly supervised temporal action localization with proxy metric modeling

作者:Hongsheng Xu, Zihan Chen, Yu Zhang, Xin Geng, Siya Mi, Zhihong Yang

摘要

Temporal localization is crucial for action video recognition. Since the manual annotations are expensive and time-consuming in videos, temporal localization with weak video-level labels is challenging but indispensable. In this paper, we propose a weakly-supervised temporal action localization approach in untrimmed videos. To settle this issue, we train the model based on the proxies of each action class. The proxies are used to measure the distances between action segments and different original action features. We use a proxy-based metric to cluster the same actions together and separate actions from backgrounds. Compared with state-of-the-art methods, our method achieved competitive results on the THUMOS14 and ActivityNet1.2 datasets.

论文关键词:temporal action localization, weakly supervised videos, proxy metric

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11704-022-1154-1