Image caption generation with dual attention mechanism

作者：

Highlights：

• We combine visual attention and textual attention to forma dual attention mechanism to guide the image caption generation.

• We adopt FCN to predict image tagsand fuse tag generation and image caption generation to train encode-decode model.

• Our proposed model achieves state-of-the-artperformance in AIC-ICC image Chinese caption dataset.

摘要

•We combine visual attention and textual attention to forma dual attention mechanism to guide the image caption generation.•We adopt FCN to predict image tagsand fuse tag generation and image caption generation to train encode-decode model.•Our proposed model achieves state-of-the-artperformance in AIC-ICC image Chinese caption dataset.

论文关键词：Image caption generation,Textual attention,Visual attention,Dual attention,Fully convolutional network

论文评审过程：Received 30 July 2019, Revised 13 November 2019, Accepted 30 November 2019, Available online 12 December 2019, Version of Record 12 December 2019.

论文官网地址：https://doi.org/10.1016/j.ipm.2019.102178