Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering

作者：

Highlights：

• A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.

• Employ term-weighted question guided visual attention to find apt visual features.

• Propose a semantic term weighting scheme to create insightful question embedding.

• MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.

• Present ablation analysis to prove the contribution of each component of MTAN.

摘要

•A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.•Employ term-weighted question guided visual attention to find apt visual features.•Propose a semantic term weighting scheme to create insightful question embedding.•MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.•Present ablation analysis to prove the contribution of each component of MTAN.

论文关键词：Attention mechanism,Deep learning,Semantic similarity,Supervised term weighting,Visual Question Answering

论文评审过程：Received 30 August 2020, Revised 31 August 2021, Accepted 3 September 2021, Available online 6 September 2021, Version of Record 22 September 2021.

论文官网地址：https://doi.org/10.1016/j.imavis.2021.104291