Semantic similarity information discrimination for video captioning
作者:
Highlights:
• We propose a semantic discrimination network (SDN) for video captioning.
• Visual tags are introduced to bridge the gap between vision and language.
• Build a semantic bilinear block to distinguish similar but not identical vision tag.
• Experimental results show that our model is superior to the state-of-the-art methods.
摘要
•We propose a semantic discrimination network (SDN) for video captioning.•Visual tags are introduced to bridge the gap between vision and language.•Build a semantic bilinear block to distinguish similar but not identical vision tag.•Experimental results show that our model is superior to the state-of-the-art methods.
论文关键词:SDN,Semantic Discrimination Network,CMB,Channel Mixing Block,LAB,Linear Attention Block,SBB,Semantic Bilinear Block,S-LSTM,Semantic Compositional Network Long Short-Term Memory,Video captioning,Semantic detection,Bilinear pooling,Channel attention,Natural language processing
论文评审过程:Received 30 March 2022, Revised 3 October 2022, Accepted 4 October 2022, Available online 13 October 2022, Version of Record 21 October 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.118985