MMSNet: Multi-modal scene recognition using multi-scale encoded features
作者:
Highlights:
• We present a novel multi-scale and multi-modal feature learning framework for RGB-D scene recognition.
• We optimize MLP parameters by gradients back-propagated through a simple yet effective soft-voting based multi-modal fusion.
• Our approach captures multi-modal relation of multi-scale RGB-D features with a loss based on linear correlation coefficient.
• The proposed approach improves our baseline and other state-of-the-art results of counterpart methods based on CNNs.
摘要
•We present a novel multi-scale and multi-modal feature learning framework for RGB-D scene recognition.•We optimize MLP parameters by gradients back-propagated through a simple yet effective soft-voting based multi-modal fusion.•Our approach captures multi-modal relation of multi-scale RGB-D features with a loss based on linear correlation coefficient.•The proposed approach improves our baseline and other state-of-the-art results of counterpart methods based on CNNs.
论文关键词:RGB-D scene recognition,Multi-modal learning,Multi-scale feature fusion
论文评审过程:Received 10 February 2022, Revised 5 April 2022, Accepted 11 April 2022, Available online 15 April 2022, Version of Record 28 April 2022.
论文官网地址:https://doi.org/10.1016/j.imavis.2022.104453