Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks

作者:

Highlights:

摘要

In this paper we address three different aspects of semantic segmentation from remote sensor data using deep neural networks. Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on the INRIA and AIRS benchmark datasets and is shown to outperform other state of the art. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Due to the lack of (1) scene depth information, and (2) ground-truth (blueprints) for large urban-areas, the evaluation of the 3D reconstructions is not possible. Thus, we use boundary localization as a proxy to reconstruction accuracy and perform the evaluation in 2D. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc.) which are complementary to the primary task. Although embedded in a trained network, this latent knowledge relating to the auxiliary tasks is never externalized or immediately expressed but instead only knowledge relating to the primary task is ever output by the network. We experimentally prove this occurrence of incidental learning on the pre-trained ICT-Net and show how sub-classification of the negative label is possible without further training/fine-tuning. We present the results of our experiments and explain how knowledge about auxiliary and complementary tasks – for which the network was never trained – can be retrieved and utilized for further classification. We extensively tested the proposed technique on the ISPRS benchmark dataset which contains multi-label ground truth, and report an average classification accuracy (F1 score) of 54.29% (SD=17.03) for roads, 10.15% (SD=2.54) for cars, 24.11% (SD=5.25) for trees, 42.74% (SD=6.62) for low vegetation, and 18.30% (SD=16.08) for clutter. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2020ICTNet/.

论文关键词:

论文评审过程:Received 22 January 2021, Revised 26 May 2021, Accepted 15 July 2021, Available online 20 July 2021, Version of Record 24 July 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103251