Few-shot activity recognition with cross-modal memory network

作者:

Highlights:

• We propose an end-to-end framework for few-shot activity recognition, which consists deep embedding module, cross-modal memory module, and the few-shot activity recognition module.

• We design an innovative cross-modal memory structure, where each memory slot is an visual-textual embedding pair that stores the multi-modal semantic information for one activity attribute.

• We conduct extensive experiments on datasets HMDB51 and UCF101 to illustrate the effectiveness and superiority of the cross-modal memory network.

摘要

•We propose an end-to-end framework for few-shot activity recognition, which consists deep embedding module, cross-modal memory module, and the few-shot activity recognition module.•We design an innovative cross-modal memory structure, where each memory slot is an visual-textual embedding pair that stores the multi-modal semantic information for one activity attribute.•We conduct extensive experiments on datasets HMDB51 and UCF101 to illustrate the effectiveness and superiority of the cross-modal memory network.

论文关键词:Few-shot learning,Activity recognition,Cross-modal memory

论文评审过程:Received 13 August 2019, Revised 6 January 2020, Accepted 25 March 2020, Available online 30 July 2020, Version of Record 4 August 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107348