Vision talks: Visual relationship-enhanced transformer for video-guided machine translation
作者:
Highlights:
• Structured conceptual representation contributes a bridge in multi-modal fusion.
• Objects’ semantics and relationships reflect the visual content effectively.
• Use graph convolutional network to explore relationships among visual semantics.
摘要
•Structured conceptual representation contributes a bridge in multi-modal fusion.•Objects’ semantics and relationships reflect the visual content effectively.•Use graph convolutional network to explore relationships among visual semantics.
论文关键词:Machine translation,Visual relationship,Transformer,Graph convolutional network
论文评审过程:Received 28 October 2021, Revised 5 June 2022, Accepted 20 July 2022, Available online 26 July 2022, Version of Record 4 August 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.118264