Improving visual question answering using dropout and enhanced question encoder

作者：

Highlights：

• A simple but effective coherent dropout is proposed to improve the ability of preventing overfitting in VQA model.

• A siamese dropout mechanism is proposed to explicitly decrease the output variance of VQA model during training.

• We further develop a deeper and wider encoding module called Multi-path Stacked Residual RNNs to enhance the representation ability of question encoder.

• The proposed methods can bring clear improvements to the state-of-the-art VQA models on VQA-v1 and VQA-v2 datasets.

摘要

•A simple but effective coherent dropout is proposed to improve the ability of preventing overfitting in VQA model.•A siamese dropout mechanism is proposed to explicitly decrease the output variance of VQA model during training.•We further develop a deeper and wider encoding module called Multi-path Stacked Residual RNNs to enhance the representation ability of question encoder.•The proposed methods can bring clear improvements to the state-of-the-art VQA models on VQA-v1 and VQA-v2 datasets.

论文关键词：Visual question answering,Coherent dropout,Siamese dropout,Enhanced question encoder

论文评审过程：Received 20 June 2018, Revised 27 November 2018, Accepted 25 January 2019, Available online 28 January 2019, Version of Record 14 February 2019.

论文官网地址：https://doi.org/10.1016/j.patcog.2019.01.038