Video-guided machine translation via dual-level back-translation

作者:

Highlights:

摘要

Video-guided machine translation aims to translate a source language description into a target language using the video information as additional spatio-temporal context. Existing methods focus on making full use of videos as auxiliary material, while ignoring the semantic consistency and reducibility between the source language and the target language. In addition, the visual concept is helpful for improving the alignment and translation of different languages but is rarely considered. Toward this end, we contribute a novel solution to thoroughly investigate the video-guided machine translation issue via dual-level back-translation. Specifically, we first exploit a sentence-level back-translation to obtain the coarse-grained semantics. Thereafter, a video concept-level back-translation module is proposed to explore the fine-grained semantic consistency and reducibility. Lastly, a multi-pattern joint learning approach is utilized to boost the translation performance. By experimenting on two real-world datasets, we demonstrate the effectiveness and rationality of our proposed solution.

论文关键词:Multiple modalities,Video-guided machine transaltion,Back-translation,Shared transformer

论文评审过程:Received 21 September 2021, Revised 25 February 2022, Accepted 13 March 2022, Available online 21 March 2022, Version of Record 1 April 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108598