Learning Cooperative Neural Modules for Stylized Image Captioning
作者:Xinxiao Wu, Wentian Zhao, Jiebo Luo
摘要
Recent progress in stylized image captioning has been achieved through the encoder-decoder framework that generates a sentence in one-pass decoding process. However, it remains difficult for such a decoding process to simultaneously capture the syntactic structure, infer the semantic concepts and express the linguistic styles. Research in psycholinguistics has revealed that the language production process of humans involves multiple stages, starting with several rough concepts and ending with fluent sentences. With this in mind, we propose a novel stylized image captioning approach that generates stylized sentences in a multi-pass decoding process by training three cooperative neural modules under the reinforcement learning paradigm. A low-level neural module called syntax module first generates the overall syntactic structure of the stylized sentence. Next, two high-level neural modules, namely concept module and style module, incorporate the words that describe factual content and the words that express linguistic style, respectively. Since the three modules contribute to different aspects of the stylized sentence, i.e. the fluency, the relevancy of the factual content and the style accuracy, we encourage the modules to specialize in their own tasks by designing different rewards for different actions. We also design an attention mechanism to facilitate the communication between the high-level and low-level modules. With the help of the attention mechanism, the high-level modules are able to take the global structure of the sentence into consideration and maintain the consistency between the factual content and the linguistic style. Evaluations on several public benchmark datasets demonstrate that our method outperforms the existing one-pass decoding methods in terms of multiple different evaluation metrics.
论文关键词:Stylized image captioning, Cooperative modular networks, Reinforcement learning, Multi-pass decoding
论文评审过程:
论文官网地址:https://doi.org/10.1007/s11263-022-01636-2