Long video question answering: A Matching-guided Attention Model

作者：

Highlights：

• We study a rarely investigated but practically important problem, namely long video QA, which can be suitably applied to many long video tasks.

• We propose a Matching-guided Attention Model (MAM) to deal with the long video QA problem, which jointly matches and regresses video snippets for questions and predicts the answers based on attended visual features.

• We generate two new datasets (a simple one and a complex one) including long videos as well as pairwise questions and answers, which can be used for evaluating the study of the long video QA problem. Experimental results demonstrate the effectiveness of our proposed method by comparing with two short video QA methods and a baseline method.

摘要

•We study a rarely investigated but practically important problem, namely long video QA, which can be suitably applied to many long video tasks.•We propose a Matching-guided Attention Model (MAM) to deal with the long video QA problem, which jointly matches and regresses video snippets for questions and predicts the answers based on attended visual features.•We generate two new datasets (a simple one and a complex one) including long videos as well as pairwise questions and answers, which can be used for evaluating the study of the long video QA problem. Experimental results demonstrate the effectiveness of our proposed method by comparing with two short video QA methods and a baseline method.

论文关键词：Long video QA,Matching-guided attention

论文评审过程：Received 6 June 2019, Revised 29 December 2019, Accepted 26 January 2020, Available online 30 January 2020, Version of Record 6 February 2020.

论文官网地址：https://doi.org/10.1016/j.patcog.2020.107248