Semantic similarity information discrimination for video captioning

作者:

Highlights:

• We propose a semantic discrimination network (SDN) for video captioning.

• Visual tags are introduced to bridge the gap between vision and language.

• Build a semantic bilinear block to distinguish similar but not identical vision tag.

• Experimental results show that our model is superior to the state-of-the-art methods.

摘要

•We propose a semantic discrimination network (SDN) for video captioning.•Visual tags are introduced to bridge the gap between vision and language.•Build a semantic bilinear block to distinguish similar but not identical vision tag.•Experimental results show that our model is superior to the state-of-the-art methods.

论文关键词:SDN,Semantic Discrimination Network,CMB,Channel Mixing Block,LAB,Linear Attention Block,SBB,Semantic Bilinear Block,S-LSTM,Semantic Compositional Network Long Short-Term Memory,Video captioning,Semantic detection,Bilinear pooling,Channel attention,Natural language processing

论文评审过程:Received 30 March 2022, Revised 3 October 2022, Accepted 4 October 2022, Available online 13 October 2022, Version of Record 21 October 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118985