Geometry Attention Transformer with position-aware LSTMs for image captioning
作者:
Highlights:
• An improved image captioning model, GAT is proposed on transformer framework.
• We design an encoder cooperated by a gate-controlled GSR.
• We reconstruct a decoder promoted by position-LSTM groups.
• Ablation experiments and comparisons are performed on COCO and Flickr30K.
摘要
•An improved image captioning model, GAT is proposed on transformer framework.•We design an encoder cooperated by a gate-controlled GSR.•We reconstruct a decoder promoted by position-LSTM groups.•Ablation experiments and comparisons are performed on COCO and Flickr30K.
论文关键词:Image captioning,Transformer framework,Gate-controlled geometry attention,Position-aware LSTM
论文评审过程:Received 22 October 2021, Revised 31 March 2022, Accepted 1 April 2022, Available online 9 April 2022, Version of Record 19 April 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117174