Hierarchical Deep Neural Network for Image Captioning

作者:Yuting Su, Yuqian Li, Ning Xu, An-An Liu

摘要

Automatically describing image content with natural language is a fundamental challenge for computer vision community. General methods used visual information to generate sentences directly. However, only depending on the visual information is not enough to generate the fine-grained descriptions for given images. In this paper, we exploit the fusion of visual information and high-level semantic information for image captioning. We propose a hierarchical deep neural network, which consists of the bottom layer and the top layer. The former extracts the visual and high-level semantic information from image and detected regions, respectively, while the latter integrates both of them with adaptive attention mechanism for the caption generation. The experimental results achieve the competing performances against the state-of-the-art methods on MSCOCO dataset.

论文关键词:Regional semantic, Image captioning, Attention mechanism

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-019-09997-5