End-to-End Learning of Latent Deformable Part-Based Representations for Object Detection

作者：Taylor Mordan, Nicolas Thome, Gilles Henaff, Matthieu Cord

摘要

Object detection methods usually represent objects through rectangular bounding boxes from which they extract features, regardless of their actual shapes. In this paper, we apply deformations to regions in order to learn representations better fitted to objects. We introduce DP-FCN, a deep model implementing this idea by learning to align parts to discriminative elements of objects in a latent way, i.e. without part annotation. This approach has two main assets: it builds invariance to local transformations, thus improving recognition, and brings geometric information to describe objects more finely, leading to a more accurate localization. We further develop both features in a new model named DP-FCN2.0 by explicitly learning interactions between parts. Alignment is done with an in-network joint optimization of all parts based on a CRF with custom potentials, and deformations are influencing localization through a bilinear product. We validate our models on PASCAL VOC and MS COCO datasets and show significant gains. DP-FCN2.0 achieves state-of-the-art results of 83.3 and 81.2% on VOC 2007 and 2012 with VOC data only.

论文关键词：Object detection, Fully convolutional network, Deep learning, Part-based representation, End-to-end latent part learning

论文评审过程：

论文官网地址：https://doi.org/10.1007/s11263-018-1109-z