Implicit neural refinement based multi-view stereo network with adaptive correlation

作者:

Highlights:

摘要

In this paper, we propose ACINR-MVSNet, an end-to-end trainable framework with adaptive group-wise correlation and implicit neural depth refinement for multi-view stereo (MVS). Previous learning-based MVS methods have demonstrated their outstanding performance, and most of them estimate depth maps in a coarse-to-fine manner. However, in a commonly used multi-stage cascaded framework, the previous wrong estimation might lead to error propagation. In contrast, we focus on another coarse-to-fine structure, i.e., one-stage MVS architecture followed by refinement modules. Inspired by implicit neural representation, we propose an implicit neural refinement module to refine the coarse depth map. Guided by the corresponding reference image, it can better recover finer details, especially those in boundary areas. To solve the visibility problem in complex scenarios while maintaining efficiency, we propose an adaptive group-wise correlation similarity measure for cost volume construction. Besides, we present a pyramid-based feature extraction network with a repeated top-down and bottom-up structure to gather more context-aware information, which can better meet the challenges in ill-posed regions. This novel feature extractor is also utilized to construct an enhanced Gauss-Newton refinement module for further upsampling and optimizing. Extensive experiments on the DTU, the Tanks & Temples and the BlendedMVS datasets demonstrate the effectiveness and generalization of our approach, which can achieve better or competitive results compared to state-of-the-art methods. The code will be available at https://github.com/BoyangSONG/ACINR-MVSNet.

论文关键词:Multi-view stereo,Implicit neural representation,Adaptive aggregation,Feature pyramid,Coarse-to-fine

论文评审过程:Received 20 May 2022, Accepted 17 June 2022, Available online 23 June 2022, Version of Record 25 June 2022.

论文官网地址:https://doi.org/10.1016/j.imavis.2022.104511