CAAN: Context-Aware attention network for visual question answering
作者:
Highlights:
• Introduce contextual information into the VQA task for the first time and propose a context-aware model CAAN.
• Employ the positional relationship information between the image regions and the image to obtain a context-enhanced visual representation.
• First introduce question contextual information to enhance the question feature representation in VQA.
• Reach a significant performance improvement or comparable performance compared with some other state-of-the-art VQA models.
摘要
•Introduce contextual information into the VQA task for the first time and propose a context-aware model CAAN.•Employ the positional relationship information between the image regions and the image to obtain a context-enhanced visual representation.•First introduce question contextual information to enhance the question feature representation in VQA.•Reach a significant performance improvement or comparable performance compared with some other state-of-the-art VQA models.
论文关键词:Visual question answering,Attention mechanism,Understanding bias,Absolute position,Contextual information
论文评审过程:Received 1 November 2021, Revised 4 June 2022, Accepted 13 August 2022, Available online 15 August 2022, Version of Record 20 August 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.108980