Learning multiscale hierarchical attention for video summarization

作者：

Highlights：

• Our method learns multiscale features, where the intra-block attention module learns the attentive frame-level features inside each block while the inter-block attention module learns the attentive block-level features.

• Moreover, we further extend our method into a two-stream framework to leverage both the appearance and motion information.

• We conduct comprehensive experiments on two widely-used video summarization datasets. Our method achieves very competitive performance compared with the state-of-the-art methods.

摘要

•Different from conventional works that employ either the hierarchical RNN for structure modeling or the self-attention mechanism for long-range dependencies, our method applies the intra-block attention and the inter-block attention to learn both the underlying structure and long-range representations.•Our method learns multiscale features, where the intra-block attention module learns the attentive frame-level features inside each block while the inter-block attention module learns the attentive block-level features.•Moreover, we further extend our method into a two-stream framework to leverage both the appearance and motion information.•We conduct comprehensive experiments on two widely-used video summarization datasets. Our method achieves very competitive performance compared with the state-of-the-art methods.

论文关键词：Video summarization,Hierarchical structure,Attention models,Multiscale temporal representation,Two-stream framework

论文评审过程：Received 22 October 2020, Revised 12 July 2021, Accepted 9 September 2021, Available online 20 September 2021, Version of Record 24 September 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.108312