Learning visual relationship and context-aware attention for image captioning
作者:
Highlights:
• We are the first to implicitly model the visual relationship among the objects in an image with a graph neural network.
• We propose a novel visual context-aware attention mechanism to select salient visual information for sentence generation.
• Experimental results on two benchmark datasets prove that our model performs much better than many state-of-the-art methods.
摘要
•We are the first to implicitly model the visual relationship among the objects in an image with a graph neural network.•We propose a novel visual context-aware attention mechanism to select salient visual information for sentence generation.•Experimental results on two benchmark datasets prove that our model performs much better than many state-of-the-art methods.
论文关键词:Image captioning,Relational reasoning,Context-aware attention
论文评审过程:Received 26 September 2018, Revised 27 September 2019, Accepted 7 October 2019, Available online 8 October 2019, Version of Record 16 October 2019.
论文官网地址:https://doi.org/10.1016/j.patcog.2019.107075