Fast multi-resolution occlusion: a method for explaining and understanding deep neural networks

摘要

Deep Convolutional Neural Networks (DCNNs) contain a high level of complexity and nonlinearity, so it is not clear based on what features DCNN models make decisions and how they can reach such promising results. There are two types of visualization techniques to interpret and explain the deep models: Backpropagation-based and Perturbation-based algorithms. The most notable drawback of the backpropagation-based visualization is that they cannot be applied for all architectures, whereas Perturbation-based visualizations are totally independent of the architectures. These methods, however, take a lot of computation and memory resources which make them slow and expensive, thereby unsuitable for many real-world applications. To cope with these problems, in this paper, a perturbation-based visualization method called Fast Multi-resolution Occlusion (FMO) are presented which is efficient in terms of time and resource consumption and can be considered in real-world applications. In order to compare the FMO with five well-known Perturbation-based visualizations methods such as Occlusion Test, Super-pixel perturbation (LIME), Randomized Input Sampling (RISE), Meaningful Perturbation and Extremal Perturbation, different experiments are designed in terms of time-consumption, visualization quality and localization accuracy. All methods are applied on 5 well-known DCNNs DenseNet121, InceptionV3, InceptionResnetV2, MobileNet and ResNet50 using common benchmark datasets ImageNet, PASCAL VOC07 and COCO14. According to the experimental results, FMO is averagely 2.32 times faster than LIME on five models DenseNet121, InceptionResnetV2, InceptionV3, MobileNet and ResNet50 with images of ILSVRC2012 dataset as well as 24.84 times faster than Occlusion Test, 11.87 times faster than RISE, 8.72 times faster than Meaningful Perturbation and 10.03 times faster than Extremal Perturbation on all of the five used models with images of common dataset ImageNet without scarifying visualization quality. Moreover, the methods are evaluated in terms of localization accuracy on two hard common datasets of PASCAL VOC07 and COCO14. The results show that FMO outperforms the compared relevant methods in terms of localization accuracy. Also, FMO extends the superimposing process of the Occlusion Test method, which yields a heatmap with more visualization quality than the Occlusion Test on many colorful images.