Multi-stream CNN: Learning representations based on human-related regions for action recognition

作者：

Highlights：

• Presenting a multi-stream CNN architecture to incorporate multiple complementary features trained in appearance and motion networks.

• Demonstrating that using full-frame, human body, and motion-salient body part regions together is effective to improve recognition performance.

• Proposing methods to detect the actor and motion-salient body part precisely.

• Verifying that high-quality flow is critically important to learn accurate video representations for action recognition.

摘要

•Presenting a multi-stream CNN architecture to incorporate multiple complementary features trained in appearance and motion networks.•Demonstrating that using full-frame, human body, and motion-salient body part regions together is effective to improve recognition performance.•Proposing methods to detect the actor and motion-salient body part precisely.•Verifying that high-quality flow is critically important to learn accurate video representations for action recognition.

论文关键词：Convolutional Neural Network,Action recognition,Multi-Stream,Motion salient region

论文评审过程：Received 13 May 2017, Revised 10 January 2018, Accepted 24 January 2018, Available online 10 February 2018, Version of Record 10 February 2018.

论文官网地址：https://doi.org/10.1016/j.patcog.2018.01.020