Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
作者:Peng Sun, Wenhu Zhang, Songyuan Li, Yilin Guo, Congli Song, Xi Li
摘要
RGB-D salient object detection (SOD) is usually formulated as a problem of classification or regression over two modalities, i.e. , RGB and depth. Hence, effective RGB-D feature modeling and multi-modal feature fusion both play a vital role in RGB-D SOD. In this paper, we propose a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects. In principle, the feature modeling scheme is carried out in a Depth-Sensitive Attention Module (DSAM), which leads to the RGB feature enhancement as well as the background distraction reduction by capturing the depth geometry prior. Furthermore, we extend and enhance the original DSAM to DSAMv2 by proposing a novel Depth Attention Generation Module (DAGM) to generate learnable depth attention maps for more robust depth-sensitive RGB feature extraction. Moreover, to perform effective multi-modal feature fusion, we further present an automatic neural architecture search approach for RGB-D SOD, which does well in finding out a feasible architecture from our specially designed multi-modal multi-scale search space. Extensive experiments on nine standard benchmarks have demonstrated the effectiveness of the proposed approach against the state-of-the-art. We name the enhanced learnable Depth-Sensitive Attention and Automatic multi-modal Fusion framework DSA\(^{2}\)Fv2.
论文关键词:RGB-D salient object detection, Depth-sensitive RGB feature, Multi-modal feature fusion
论文评审过程:
论文官网地址:https://doi.org/10.1007/s11263-022-01646-0