Vision talks: Visual relationship-enhanced transformer for video-guided machine translation

作者：

Highlights：

• Structured conceptual representation contributes a bridge in multi-modal fusion.

• Objects’ semantics and relationships reflect the visual content effectively.

• Use graph convolutional network to explore relationships among visual semantics.

摘要

•Structured conceptual representation contributes a bridge in multi-modal fusion.•Objects’ semantics and relationships reflect the visual content effectively.•Use graph convolutional network to explore relationships among visual semantics.

论文关键词：Machine translation,Visual relationship,Transformer,Graph convolutional network

论文评审过程：Received 28 October 2021, Revised 5 June 2022, Accepted 20 July 2022, Available online 26 July 2022, Version of Record 4 August 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.118264

原文链接
谷歌学术
必应学术
百度学术