Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

作者:Ying Cui, Dongyan Guo, Yanyan Shao, Zhenhua Wang, Chunhua Shen, Liyan Zhang, Shengyong Chen

摘要

Visual tracking of generic objects is one of the fundamental but challenging problems in computer vision. Here, we propose a novel fully convolutional Siamese network to solve visual tracking by directly predicting the target bounding box in an end-to-end manner. We first reformulate the visual tracking task as two subproblems: a classification problem for pixel category prediction and a regression task for object status estimation at this pixel. With this decomposition, we design a simple yet effective Siamese architecture based classification and regression framework, termed SiamCAR, which consists of two subnetworks: a Siamese subnetwork for feature extraction and a classification-regression subnetwork for direct bounding box prediction. Since the proposed framework is both proposal- and anchor-free, SiamCAR can avoid the tedious hyper-parameter tuning of anchors, considerably simplifying the training. To demonstrate that a much simpler tracking framework can achieve superior tracking results, we conduct extensive experiments and comparisons with state-of-the-art trackers on a few challenging benchmarks. Without bells and whistles, SiamCAR achieves leading performance with a real-time speed. Furthermore, the ablation study validates that the proposed framework is effective with various backbone networks, and can benefit from deeper networks. Code is available at https://github.com/ohhhyeahhh/SiamCAR.

论文关键词:Fully convolutional networks, Anchor-free object detection, Visual tracking

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-021-01559-4