Temporal-adaptive sparse feature aggregation for video object detection

作者:

Highlights:

• We propose a novel temporal-adaptive sparse feature aggregation framework for challenging video object detection.

• A stride predictor is proposed to adaptively select the aggregated frames with a temporal-adaptive sparse sampling strategy.

• A pixel-adaptive aggregation module is proposed to enhance pixel feature quality with aligned feature of nearby frames.

• An object-relational aggregation module is proposed to further enhance the proposal features with a graph-based module.

• Our framework aggregates fewer frames than traditional dense aggregation methods and achieves state-of-the-art performance.

摘要

•We propose a novel temporal-adaptive sparse feature aggregation framework for challenging video object detection.•A stride predictor is proposed to adaptively select the aggregated frames with a temporal-adaptive sparse sampling strategy.•A pixel-adaptive aggregation module is proposed to enhance pixel feature quality with aligned feature of nearby frames.•An object-relational aggregation module is proposed to further enhance the proposal features with a graph-based module.•Our framework aggregates fewer frames than traditional dense aggregation methods and achieves state-of-the-art performance.

论文关键词:Video object detection,Temporal-adaptive sparse sampling,Pixel-adaptive aggregation,Object-relational aggregation

论文评审过程:Received 1 April 2021, Revised 22 December 2021, Accepted 11 February 2022, Available online 13 February 2022, Version of Record 26 February 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108587