TAB: Temporally aggregated bag-of-discriminant-words for temporal action proposals

作者:

Highlights:

摘要

In this work, we propose a new method to generate temporal action proposals from long untrimmed videos named Temporally Aggregated Bag-of-Discriminant-Words (TAB). TAB is based on the observation that there are many overlapping frames in action and background temporal regions of untrimmed videos, which cause difficulties in segmenting actions from non-action regions. TAB solves this issue by extracting class-specific codewords from the action and background videos and extracting the discriminative weights of these codewords based on their ability to discriminate between these two classes. We integrate these discriminative weights with Bag of Word encoding, which we then call Bag-of-Discriminant-Words (BoDW). We sample the untrimmed videos into non-overlapping snippets and temporally aggregate the BoDW representation of multiple snippets into action proposals using a binary classifier trained on trimmed videos in a single pass. TAB can be used with different types of features, including those computed by deep networks. We present the effectiveness of the TAB proposal extraction method on two challenging temporal action detection datasets: MSR-II and Thumos14, where it improves upon state-of-the-art with recall rates of 87.0% and 82.0% respectively at a temporal intersection over union ratio of 0.8.

论文关键词:

论文评审过程:Received 22 May 2018, Revised 15 January 2019, Accepted 17 April 2019, Available online 25 April 2019, Version of Record 15 May 2019.

论文官网地址:https://doi.org/10.1016/j.cviu.2019.04.008