Spatio-temporal object detection by deep learning: Video-interlacing to improve multi-object tracking

作者:

Highlights:

摘要

Tracking-by-detection have become a hot topic of great interest to some computer vision applications in the recent years. Generally, the existing tracking-by-detection frameworks have difficulties with congestion, occlusion, and inaccurate detection in crowded scenes. In this paper, we propose a new framework for Multi-Object Tracking-by-Detection (MOT-bD) based on a spatio-temporal interlaced encoding video model and a specialized Deep Convolutional Neural Network (DCNN) detector. The spatio-temporal variation of objects between images are encoded into “interlaced images”. A specialized “interlaced object” convolutional deep detector is trained to detect objects in interlaced images and a classical association algorithm to perform the association between detected objects, since interlaced objects are built to increase overlap during the association step which leads to improve the MOT performance over the same detector/association algorithm applied on non-interlaced images.The effectiveness and robustness of this contribution is demonstrated by experiments on popular tracking-by-detection datasets and benchmarks such as the PETS, TUD and the MOT17 benchmark. Experimental results demonstrate that interlacing video idea has many advantages to improve the tracking performances in terms of both precision and accuracy of tracking and illustrate that the “power of video-interlacing” outperforms several state-of-the-art tracking frameworks in multiple object tracking.

论文关键词:Multi-object tracking,Interlacing and inverse interlacing models,Specialization,Interlaced deep detector

论文评审过程:Received 1 March 2019, Accepted 5 March 2019, Available online 28 March 2019, Version of Record 7 July 2019.

论文官网地址:https://doi.org/10.1016/j.imavis.2019.03.002