Coordinate-based anchor-free module for object detection

作者:Zhiyong Tang, Jianbing Yang, Zhongcai Pei, Xiao Song

摘要

Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single-shot detector (e.g., RetinaNet or SSD) or predict the output probabilities and coordinates directly. The main idea of the CBAF module is to predict the category and the adjustments to the box of the object by part feature and its contextual part features, which are based on feature maps divided by spatial coordinates. This is inspired by the fact that human beings can infer an entire object by observing the part of the surrounding environment. The CBAF module will encode and decode boxes in the anchor-free manner per feature map with different resolutions during training and testing. During training, we first use the proposed spatial coordinate partition layer to divide feature maps into several parts of size n × n and then propose a contextual building layer to fuse the part and its contextual parts together. We will demonstrate the CBAF module through a concrete implementation. The CBAF module improves AP scores of object detection with nearly no additional computation when working in conjunction with the anchor-based RetinaNet. Furthermore, experimental results on the MS-COCO dataset show that the mAP of the CBAF module has increased by 1.1%, compared with RetinaNet. When the CBAF module works in conjunction with the anchor-based RetinaNet, the mAP increased by 2.2%.

论文关键词:Anchor-free, Contextual part features, Object detection, Part feature

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-021-02373-8