LSTM guided ensemble correlation filter tracking with appearance model pool

作者:

Highlights:

摘要

Deep learning based visual trackers have the potential to provide good performance for object tracking. Most of them use hierarchical features learned from multiple layers of a deep network. However, issues related to deterministic aggregation of these features from various layers, difficulties in estimating variations in scale or rotation of the object being tracked, as well as challenges in effectively modeling the object’s appearance over long time periods leaves substantial scope to improve performance. In this paper, we propose a tracker that learns correlation filters over features from multiple layers of a VGG network. A correlation filter for an individual layer is used to predict the target location. We adaptively learn the contribution of an ensemble of correlation filters for the final location estimation using an LSTM. An adaptive approach is advantageous as different layers encode diverse feature representations and a uniform contribution would not fully exploit this contrastive information. To this end, we use an LSTM as it encodes the interactions for past appearances which is useful for tracking. Further, the scale and rotation parameters are estimated using respective correlation filters. Additionally, an appearance model pool is used that prevents the correlation filter from drifting. Experimental results achieved on five public datasets — Object Tracking Benchmark (OTB100), Visual Object Tracking (VOT) Benchmark 2016, VOT Benchmark 2017, Tracking Dataset and UAV123 Dataset, reveal that our approach outperforms state of the art approaches for object tracking.

论文关键词:

论文评审过程:Received 12 December 2018, Revised 9 February 2020, Accepted 10 February 2020, Available online 25 February 2020, Version of Record 28 February 2020.

论文官网地址:https://doi.org/10.1016/j.cviu.2020.102935