Comprehensive-perception dynamic reasoning for visual question answering

作者:

Highlights:

• The proposed comprehensive-perception dynamic reasoning model can perceive all the object features from the previous reasoning process.

• The introduction of relation network as a guide for interaction between features enhances the relational reasoning capability of the model.

• Employing intra- and inter-layer attention weights optimizes the importance of object features in the reasoning process.

• Incorporating our CPDR module into the VLP models brings considerable performance improvements.

摘要

•The proposed comprehensive-perception dynamic reasoning model can perceive all the object features from the previous reasoning process.•The introduction of relation network as a guide for interaction between features enhances the relational reasoning capability of the model.•Employing intra- and inter-layer attention weights optimizes the importance of object features in the reasoning process.•Incorporating our CPDR module into the VLP models brings considerable performance improvements.

论文关键词:Cross-modal information fusion,Visual question answering,Comprehensive perception,Relational reasoning

论文评审过程:Received 27 December 2021, Revised 30 May 2022, Accepted 26 June 2022, Available online 1 July 2022, Version of Record 9 July 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108878