Place perception from the fusion of different image representation

作者:

Highlights:

• We propose a multi-task deep neural network to realize the indoor place understanding and recognition together, which imitates and learns the process of place perception in a human-style.

• From the perspective of multi-modal information transformation and complementation, we propose an image captioning model to automatically generate natural language descriptions from place images, which is an additional information source to assist the decision-making in place recognition.

• We propose a multi-modal feature extraction and fusion architecture based on a mixed-CNN-LSTM network that gathers both visual and linguistic features corresponding to instance-level and concept-level information, respectively.

• We validate the effectiveness of the proposed strategy of using natural language descriptions to place perception through experiments on four public image datasets.

摘要

•We propose a multi-task deep neural network to realize the indoor place understanding and recognition together, which imitates and learns the process of place perception in a human-style.•From the perspective of multi-modal information transformation and complementation, we propose an image captioning model to automatically generate natural language descriptions from place images, which is an additional information source to assist the decision-making in place recognition.•We propose a multi-modal feature extraction and fusion architecture based on a mixed-CNN-LSTM network that gathers both visual and linguistic features corresponding to instance-level and concept-level information, respectively.•We validate the effectiveness of the proposed strategy of using natural language descriptions to place perception through experiments on four public image datasets.

论文关键词:Indoor place perception,CNN,LSTM,Convolutional auto-encoder,Natural language

论文评审过程:Received 8 January 2020, Revised 7 September 2020, Accepted 23 September 2020, Available online 24 September 2020, Version of Record 8 October 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107680