Inferring spatial relations from textual descriptions of images

作者：

Highlights：

• A novel dataset (REC-COCO) for spatial inference from text, where textual tokens are linked to bounding boxes in images.

• Experiments that prove the contextual information of textual captions helps inferring the spatial relation between objects.

• We present an experimental analysis of various scenarios to infer spatial relations from text.

• Qualitative results that suggest that neural network architectures can learn prototypical spatial relations between objects.

摘要

•A novel dataset (REC-COCO) for spatial inference from text, where textual tokens are linked to bounding boxes in images.•Experiments that prove the contextual information of textual captions helps inferring the spatial relation between objects.•We present an experimental analysis of various scenarios to infer spatial relations from text.•Qualitative results that suggest that neural network architectures can learn prototypical spatial relations between objects.

论文关键词：Text-to-image synthesis,Natural language understanding,Spatial relations,Deep learning

论文评审过程：Received 25 June 2020, Revised 31 December 2020, Accepted 2 January 2021, Available online 27 January 2021, Version of Record 2 February 2021.

论文官网地址：https://doi.org/10.1016/j.patcog.2021.107847