Vision talks: Visual relationship-enhanced transformer for video-guided machine translation

作者:

Highlights:

• Structured conceptual representation contributes a bridge in multi-modal fusion.

• Objects’ semantics and relationships reflect the visual content effectively.

• Use graph convolutional network to explore relationships among visual semantics.

摘要

•Structured conceptual representation contributes a bridge in multi-modal fusion.•Objects’ semantics and relationships reflect the visual content effectively.•Use graph convolutional network to explore relationships among visual semantics.

论文关键词:Machine translation,Visual relationship,Transformer,Graph convolutional network

论文评审过程:Received 28 October 2021, Revised 5 June 2022, Accepted 20 July 2022, Available online 26 July 2022, Version of Record 4 August 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.118264