MAPNet: Multi-modal attentive pooling network for RGB-D indoor scene classification

作者:

Highlights:

• Orderless pooling can maintain spatial invariance in local information aggregation for indoor scene classification.

• Intra-modality Attentive Pooling mines and pools discriminative local semantic cues in each modality.

• Cross-modality Attentive Pooling learns to attend on different modalities in terms of different local cues to fuse the selected discriminative semantic cues across modalities.

• The attention weights in the model are interpretable for understanding both scene classification and RGB-D fusion.

• State-of-the-art results are achieved on both challenging SUN RGB-D Dataset and NYU Depth V2 Dataset.

摘要

•Orderless pooling can maintain spatial invariance in local information aggregation for indoor scene classification.•Intra-modality Attentive Pooling mines and pools discriminative local semantic cues in each modality.•Cross-modality Attentive Pooling learns to attend on different modalities in terms of different local cues to fuse the selected discriminative semantic cues across modalities.•The attention weights in the model are interpretable for understanding both scene classification and RGB-D fusion.•State-of-the-art results are achieved on both challenging SUN RGB-D Dataset and NYU Depth V2 Dataset.

论文关键词:Indoor scene classification,Multi-modal fusion,RGB-D,Attentive pooling

论文评审过程:Received 20 August 2018, Revised 8 January 2019, Accepted 7 February 2019, Available online 8 February 2019, Version of Record 16 February 2019.

论文官网地址:https://doi.org/10.1016/j.patcog.2019.02.005