Visual enhanced gLSTM for image captioning

作者：

Highlights：

• A visual enhanced guiding long short-term memory is proposed for image captioning.

• Visual features combined with text is used to guide long short-term memory.

• Visual information is added to the model for avoiding gradient diminishing.

• Region based visual enhancement method by region of interest or salient region is proposed.

• Image based visual enhancement method by visual words is proposed.

摘要

•A visual enhanced guiding long short-term memory is proposed for image captioning.•Visual features combined with text is used to guide long short-term memory.•Visual information is added to the model for avoiding gradient diminishing.•Region based visual enhancement method by region of interest or salient region is proposed.•Image based visual enhancement method by visual words is proposed.

论文关键词：Image caption,Visual enhanced-gLSTM,Bag of words,Region of interest,Salient region

论文评审过程：Received 24 December 2019, Revised 30 April 2021, Accepted 21 June 2021, Available online 3 July 2021, Version of Record 10 July 2021.

论文官网地址：https://doi.org/10.1016/j.eswa.2021.115462