Learning multimodal relationship interaction for visual relationship detection
作者:
Highlights:
• Construct “hyper-graphs” which take relationships as nodes, to investigate the associations of relationships.
• Capture the relationship-wise contexts from multimodal cues with the proposed multimodal relationship interaction network.
• Introduce relevance module, entity appearance reconstruction and multimodal affinity attention to mitigate the noise from nodes.
• Extensive experiments on two datasets have shown the effectiveness of the relationship interaction model.
摘要
•Construct “hyper-graphs” which take relationships as nodes, to investigate the associations of relationships.•Capture the relationship-wise contexts from multimodal cues with the proposed multimodal relationship interaction network.•Introduce relevance module, entity appearance reconstruction and multimodal affinity attention to mitigate the noise from nodes.•Extensive experiments on two datasets have shown the effectiveness of the relationship interaction model.
论文关键词:Visual relationship detection,Scene graph generation,Relationship context,Multimodal relationship interaction
论文评审过程:Received 2 June 2021, Revised 22 March 2022, Accepted 12 June 2022, Available online 14 June 2022, Version of Record 30 July 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.108848