Learning visual relationship and context-aware attention for image captioning

作者：

Highlights：

• We are the first to implicitly model the visual relationship among the objects in an image with a graph neural network.

• We propose a novel visual context-aware attention mechanism to select salient visual information for sentence generation.

• Experimental results on two benchmark datasets prove that our model performs much better than many state-of-the-art methods.

摘要

•We are the first to implicitly model the visual relationship among the objects in an image with a graph neural network.•We propose a novel visual context-aware attention mechanism to select salient visual information for sentence generation.•Experimental results on two benchmark datasets prove that our model performs much better than many state-of-the-art methods.

论文关键词：Image captioning,Relational reasoning,Context-aware attention

论文评审过程：Received 26 September 2018, Revised 27 September 2019, Accepted 7 October 2019, Available online 8 October 2019, Version of Record 16 October 2019.

论文官网地址：https://doi.org/10.1016/j.patcog.2019.107075