A survey of methods, datasets and evaluation metrics for visual question answering

作者:

Highlights:

• We discussed current visual question answering methods based on fusion techniques.

• We reviewed 20 existing VQA datasets and provide critical discussion.

• We have discussed new evaluation metrics apart from traditional evaluation metrics.

• Challenges and opportunities in visual question answering are discussed.

摘要

•We discussed current visual question answering methods based on fusion techniques.•We reviewed 20 existing VQA datasets and provide critical discussion.•We have discussed new evaluation metrics apart from traditional evaluation metrics.•Challenges and opportunities in visual question answering are discussed.

论文关键词:Computer vision,Natural language processing,Deep neural networks,World knowledge,Attention

论文评审过程:Received 8 September 2021, Revised 2 October 2021, Accepted 8 October 2021, Available online 15 October 2021, Version of Record 27 October 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104327