On the role of question encoder sequence model in robust visual question answering
作者:
Highlights:
• The question-encoder sequence model plays a significant role in overfitting the VQA models to the train set language biases and reducing the performance on Out-of-Distribution test sets.
• A comprehensive study of existing RNN-based and Transformer-based question-encoders on the Out-of-Distribution performance in VQA.
• Proposal of a novel question-encoder GAT-QE for VQA that shows better resilience to language biases and improves the Out-of-Distribution performance even without using additional bias-mitigation approaches.
摘要
•The question-encoder sequence model plays a significant role in overfitting the VQA models to the train set language biases and reducing the performance on Out-of-Distribution test sets.•A comprehensive study of existing RNN-based and Transformer-based question-encoders on the Out-of-Distribution performance in VQA.•Proposal of a novel question-encoder GAT-QE for VQA that shows better resilience to language biases and improves the Out-of-Distribution performance even without using additional bias-mitigation approaches.
论文关键词:Visual question answering,Out-of-distribution performance,Gated recurrent unit,Transformer,Graph attention network
论文评审过程:Received 21 December 2021, Revised 21 June 2022, Accepted 29 June 2022, Available online 3 July 2022, Version of Record 9 July 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.108883