Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering

作者:

Highlights:

• A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.

• Employ term-weighted question guided visual attention to find apt visual features.

• Propose a semantic term weighting scheme to create insightful question embedding.

• MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.

• Present ablation analysis to prove the contribution of each component of MTAN.

摘要

•A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.•Employ term-weighted question guided visual attention to find apt visual features.•Propose a semantic term weighting scheme to create insightful question embedding.•MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.•Present ablation analysis to prove the contribution of each component of MTAN.

论文关键词:Attention mechanism,Deep learning,Semantic similarity,Supervised term weighting,Visual Question Answering

论文评审过程:Received 30 August 2020, Revised 31 August 2021, Accepted 3 September 2021, Available online 6 September 2021, Version of Record 22 September 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104291