Distributional semantics of objects in visual scenes in comparison to text

作者:

摘要

The distributional hypothesis states that the meaning of a concept is defined through the contexts it occurs in. In practice, often word co-occurrence and proximity are analyzed in text corpora for a given word to obtain a real-valued semantic word vector, which is taken to (at least partially) encode the meaning of this word. Here we transfer this idea from text to images, where pre-assigned labels of other objects or activations of convolutional neural networks serve as context. We propose a simple algorithm that extracts and processes object contexts from an image database and yields semantic vectors for objects. We show empirically that these representations exhibit on par performance with state-of-the-art distributional models over a set of conventional objects. For this we employ well-known word benchmarks in addition to a newly proposed object-centric benchmark.

论文关键词:Object semantics,Vision and language,Semantics,Distributional hypothesis,Computer vision

论文评审过程:Received 24 July 2017, Revised 31 May 2018, Accepted 4 December 2018, Available online 7 February 2019, Version of Record 28 February 2019.

论文官网地址:https://doi.org/10.1016/j.artint.2018.12.009