A lightweight network for monocular depth estimation with decoupled body and edge supervision

作者:

Highlights:

摘要

Learning depth from a single image is a challenging task in computer vision. Many recent works on monocular depth estimation explore increasingly large convolutional neural networks to learn monocular cues implicitly. Such methods may fail to generalize well around object boundaries as large networks tend to distort the fine details (such as edges and corners) in low-resolution layers, leading to a poor depth prediction near object edges. To reduce depth loss near object boundaries, this paper proposes to explicitly decouple depth features for the body and edges of objects corresponding to low and high-frequency regions of an image, respectively. To this end, we learn a flow field to warp depth features into consistent body features and residual edge features. Afterward, decoupled supervision is employed on both sets of features to learn body and edge depth maps explicitly. Moreover, we also propose a lightweight encoder-decoder network that efficiently combines features at multiple scales to alleviate the loss of fine details in the final feature map. Extensive experiments on NYUD-v2 and KITTI datasets demonstrate that our proposed lightweight network with depth decoupling performs comparably to state-of-the-art methods while drastically reducing the number of parameters.

论文关键词:Monocular depth estimation,Deep learning,Lightweight network

论文评审过程:Received 21 June 2021, Accepted 13 July 2021, Available online 27 July 2021, Version of Record 7 August 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104261