Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering
作者:
Highlights:
• A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.
• Employ term-weighted question guided visual attention to find apt visual features.
• Propose a semantic term weighting scheme to create insightful question embedding.
• MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.
• Present ablation analysis to prove the contribution of each component of MTAN.
摘要
•A Multi-Tier Attention Network (MTAN) is proposed for the AI-complete task of VQA.•Employ term-weighted question guided visual attention to find apt visual features.•Propose a semantic term weighting scheme to create insightful question embedding.•MTAN achieve competing performance against the state-of-the-art methods on DAQUAR.•Present ablation analysis to prove the contribution of each component of MTAN.
论文关键词:Attention mechanism,Deep learning,Semantic similarity,Supervised term weighting,Visual Question Answering
论文评审过程:Received 30 August 2020, Revised 31 August 2021, Accepted 3 September 2021, Available online 6 September 2021, Version of Record 22 September 2021.
论文官网地址:https://doi.org/10.1016/j.imavis.2021.104291