HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog
作者:
Highlights:
• A novel deep neural architecture HVLM is proposed for Visual Dialog.
• A dual-perspectives encoding mechanism is designed to understand an image comprehensively.
• An iterative learning strategy is designed to capture fine-grained semantic interactions in the dialog history.
• Experimental results demonstrate that our proposed model outperforms other comparable models by a significant margin on benchmark datasets.
摘要
•A novel deep neural architecture HVLM is proposed for Visual Dialog.•A dual-perspectives encoding mechanism is designed to understand an image comprehensively.•An iterative learning strategy is designed to capture fine-grained semantic interactions in the dialog history.•Experimental results demonstrate that our proposed model outperforms other comparable models by a significant margin on benchmark datasets.
论文关键词:Visual Dialog,Visual-language understanding,Dual-perspective reasoning,Simple spectral graph convolution network
论文评审过程:Received 18 January 2022, Revised 25 June 2022, Accepted 26 June 2022, Available online 18 July 2022, Version of Record 18 July 2022.
论文官网地址:https://doi.org/10.1016/j.ipm.2022.103008