Spatiotemporal distilled dense-connectivity network for video action recognition

作者：

Highlights：

• We propose a novel Spatiotemporal Distilled Dense-Connectivity Network (SDDN) for action recognition.

• We propose to generalize dense-connectivity into spatiotemporal domain 5 via building block-level dense connections between appearance and motion streams, permitting effective spatiotemporal interaction at the feature rep- resentation layers.

• We propose a novel knowledge distillation module, which is composed of two students and a teacher, allowing appearance and motion streams to 10 interact effectively at the high level layers.

• Our model obtains promising performance in action recognition on two benchmark datasets, including UCF101 and HMDB51 respectively.

摘要

•We propose a novel Spatiotemporal Distilled Dense-Connectivity Network (SDDN) for action recognition.•We propose to generalize dense-connectivity into spatiotemporal domain 5 via building block-level dense connections between appearance and motion streams, permitting effective spatiotemporal interaction at the feature rep- resentation layers.•We propose a novel knowledge distillation module, which is composed of two students and a teacher, allowing appearance and motion streams to 10 interact effectively at the high level layers.•Our model obtains promising performance in action recognition on two benchmark datasets, including UCF101 and HMDB51 respectively.

论文关键词：Two-stream,Action recognition,Dense-connectivity,Knowledge distillation

论文评审过程：Received 13 August 2018, Revised 16 January 2019, Accepted 2 March 2019, Available online 9 March 2019, Version of Record 21 March 2019.

论文官网地址：https://doi.org/10.1016/j.patcog.2019.03.005