Convolutional neural net bagging for online visual tracking

作者：

Highlights：

•

摘要

Recently, Convolutional Neural Nets (CNNs) have been successfully applied to online visual tracking. However, a major problem is that such models may be inevitably over-fitted due to two main factors. The first one is the label noise because the online training of any models relies solely on the detection of the previous frames. The second one is the model uncertainty due to the randomized training strategy. In this work, we cope with noisy labels and the model uncertainty within the framework of bagging (bootstrap aggregating), resulting in efficient and effective visual tracking. Instead of using multiple models in a bag, we design a single multitask CNN for learning effective feature representations of the target object. In our model, each task has the same structure and shares the same set of convolutional features, but is trained using different random samples generated for different tasks. A significant advantage is that the bagging overhead for our model is minimal, and no extra efforts are needed to handle the outputs of different tasks as done in those multi-lifespan models. Experiments demonstrate that our CNN tracker outperforms the state-of-the-art methods on three recent benchmarks (over 80 video sequences), which illustrates the superiority of the feature representations learned by our purely online bagging framework.

论文关键词：

论文评审过程：Received 1 September 2015, Revised 8 April 2016, Accepted 5 July 2016, Available online 29 July 2016, Version of Record 21 November 2016.

论文官网地址：https://doi.org/10.1016/j.cviu.2016.07.002