The semantic typology of visually grounded paraphrases

作者：

Highlights：

• VGPs are different phrasal expressions describing the same visual concept in an image.

• Previous studies ignore various phenomena behind VGPs.

• We propose semantic typology for VGPs, aiming to elucidate the VGP phenomena.

• We construct a large VGP dataset according to our typology.

• We show the importance of joint language and vision learning for VGP classification.

摘要

•VGPs are different phrasal expressions describing the same visual concept in an image.•Previous studies ignore various phenomena behind VGPs.•We propose semantic typology for VGPs, aiming to elucidate the VGP phenomena.•We construct a large VGP dataset according to our typology.•We show the importance of joint language and vision learning for VGP classification.

论文关键词：Vision and language,Image interpretation,Visual grounded paraphrases,Semantic typology,Dataset

论文评审过程：Received 8 January 2021, Revised 12 August 2021, Accepted 30 November 2021, Available online 11 December 2021, Version of Record 18 December 2021.

论文官网地址：https://doi.org/10.1016/j.cviu.2021.103333