MMSNet: Multi-modal scene recognition using multi-scale encoded features

作者：

Highlights：

• We present a novel multi-scale and multi-modal feature learning framework for RGB-D scene recognition.

• We optimize MLP parameters by gradients back-propagated through a simple yet effective soft-voting based multi-modal fusion.

• Our approach captures multi-modal relation of multi-scale RGB-D features with a loss based on linear correlation coefficient.

• The proposed approach improves our baseline and other state-of-the-art results of counterpart methods based on CNNs.

摘要

•We present a novel multi-scale and multi-modal feature learning framework for RGB-D scene recognition.•We optimize MLP parameters by gradients back-propagated through a simple yet effective soft-voting based multi-modal fusion.•Our approach captures multi-modal relation of multi-scale RGB-D features with a loss based on linear correlation coefficient.•The proposed approach improves our baseline and other state-of-the-art results of counterpart methods based on CNNs.

论文关键词：RGB-D scene recognition,Multi-modal learning,Multi-scale feature fusion

论文评审过程：Received 10 February 2022, Revised 5 April 2022, Accepted 11 April 2022, Available online 15 April 2022, Version of Record 28 April 2022.

论文官网地址：https://doi.org/10.1016/j.imavis.2022.104453