Multi-scale relation reasoning for multi-modal Visual Question Answering
作者:
Highlights:
• Multiscale design to describe nature of VQA in involving multiple objects.
• Regional Attention to select informative question-related regions.
• Three proper designed stages for multimodal fusion among textual question and visual image.
摘要
•Multiscale design to describe nature of VQA in involving multiple objects.•Regional Attention to select informative question-related regions.•Three proper designed stages for multimodal fusion among textual question and visual image.
论文关键词:Multi-modal data,Visual Question Answering,Multi-scale relation reasoning,Attention model
论文评审过程:Received 23 August 2020, Revised 5 May 2021, Accepted 6 May 2021, Available online 14 May 2021, Version of Record 17 May 2021.
论文官网地址:https://doi.org/10.1016/j.image.2021.116319