Comprehensive-perception dynamic reasoning for visual question answering
作者:
Highlights:
• The proposed comprehensive-perception dynamic reasoning model can perceive all the object features from the previous reasoning process.
• The introduction of relation network as a guide for interaction between features enhances the relational reasoning capability of the model.
• Employing intra- and inter-layer attention weights optimizes the importance of object features in the reasoning process.
• Incorporating our CPDR module into the VLP models brings considerable performance improvements.
摘要
•The proposed comprehensive-perception dynamic reasoning model can perceive all the object features from the previous reasoning process.•The introduction of relation network as a guide for interaction between features enhances the relational reasoning capability of the model.•Employing intra- and inter-layer attention weights optimizes the importance of object features in the reasoning process.•Incorporating our CPDR module into the VLP models brings considerable performance improvements.
论文关键词:Cross-modal information fusion,Visual question answering,Comprehensive perception,Relational reasoning
论文评审过程:Received 27 December 2021, Revised 30 May 2022, Accepted 26 June 2022, Available online 1 July 2022, Version of Record 9 July 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.108878