Adaptively Converting Auxiliary Attributes and Textual Embedding for Video Captioning Based on BiLSTM

作者:Shuqin Chen, Xian Zhong, Lin Li, Wenxuan Liu, Cheng Gu, Luo Zhong

摘要

Automatically generating captions for videos faces a huge challenge since it is a cross-modal cross task that involves vision and texts. Most of the existing models generate the captioning words merely based on the video visual content features, ignoring the important underlying semantic information. The relationship between explicit semantics and hidden visual content is not holistically exploited, thus hardly describing fine-grained caption accurately from a global view. To better extract and integrate the semantic information, we propose a novel encoder-decoder framework of bi-directional long short-term memory with attention model and conversion gate (BiLSTM-CG), which transfers auxiliary attributes and then generates detailed captioning. Specifically, we extract semantic attributes from sliced frames in a multiple-instance learning (MIL) manner. MIL algorithms attempt to learn a classification function that can predict the labels of bags and/or instances in the visual content. In the encoding stage, we adopt 2D and 3D convolutional neural networks to encode video clips, and then feed the concatenate features into a BiLSTM. In decoding stage, we design a CG to adaptively fuse semantic attributes into hidden features at word level, and a CG can convert auxiliary attributes and textual embedding for video captioning. Furthermore, the CG has an ability to automatically decide the optimal time stamp to capture the explicit semantic or rely on the hidden states of the language model to generate the next word. Extensive experiments conducted on the MSR-VTT and MSVD video captioning datasets demonstrate the effectiveness of our method compared with state-of-the-art approaches.

论文关键词:Video captioning, Bi-directional long short-term memory, Multiple-instance learning, Semantic fine-grained attributes, Attention mechanism, Conversion gate

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11063-020-10352-2