AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection

Highlights：

• Attention GRU (AGRU) is proposed to enhance the cross-modal fusion or multi-level fusion in a unified structure based on attention mechanism. It can selectively combine color and depth image, and adaptively remember optimal fusion result.

• To further improve the performance of our network, three modules (CFM, LRM, DEM) are proposed. They are designed to fuse the high-layer semantic information, refine the low-layer features, and enhance the local details.

摘要

•A cross-modal and multi-level attention gated recurrent fusion network is proposed for salient object detection in RGB-D image. It uses two-stage gated recurrent unit to fuse the cross-modal and multi-level features in the decoding process.•Attention GRU (AGRU) is proposed to enhance the cross-modal fusion or multi-level fusion in a unified structure based on attention mechanism. It can selectively combine color and depth image, and adaptively remember optimal fusion result.•To further improve the performance of our network, three modules (CFM, LRM, DEM) are proposed. They are designed to fuse the high-layer semantic information, refine the low-layer features, and enhance the local details.

论文评审过程：Received 22 May 2021, Revised 15 January 2022, Accepted 17 February 2022, Available online 4 March 2022, Version of Record 15 March 2022.