Cross-modal knowledge reasoning for knowledge-based visual question answering

作者:

Highlights:

• Using multiple knowledge graphs from the visual, semantic and factual views to depict the multimodal knowledge.

• A memory-based recurrent model for multi-step knowledge reasoning over graphstructured multimodal knowledge.

• Good interpretability to reveal the knowledge selection mode from different modalities.

• Significant improvement over state-of-the-art approaches on three benchmark datasets.

摘要

•Using multiple knowledge graphs from the visual, semantic and factual views to depict the multimodal knowledge.•A memory-based recurrent model for multi-step knowledge reasoning over graphstructured multimodal knowledge.•Good interpretability to reveal the knowledge selection mode from different modalities.•Significant improvement over state-of-the-art approaches on three benchmark datasets.

论文关键词:Cross-modal knowledge reasoning,Multimodal knowledge graphs,Compositional reasoning module,Knowledge-based visual question answering,Explainable reasoning

论文评审过程:Received 12 March 2020, Revised 13 May 2020, Accepted 21 July 2020, Available online 22 July 2020, Version of Record 27 July 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107563