Deep Human-Interaction and Association by Graph-Based Learning for Multiple Object Tracking in the Wild

作者:Cong Ma, Fan Yang, Yuan Li, Huizhu Jia, Xiaodong Xie, Wen Gao

摘要

Multiple Object Tracking (MOT) in the wild has a wide range of applications in surveillance retrieval and autonomous driving. Tracking-by-Detection has become a mainstream solution in MOT, which is composed of feature extraction and data association. Most of the existing methods focus on extracting targets’ individual features and optimizing the association by hand-crafted algorithms. In this paper, we specially consider the interrelation cue between targets and we propose Human-Interaction Model (HIM) to extract interaction features between the tracked target and its surrounding. The interaction model has more discriminative features to distinguish objects, especially in crowded (dense) scene. Meanwhile we propose an efficient end-to-end model, Deep Association Network (DAN), to optimize the association with graph-based learning mechanism. Both HIM and DAN are constructed by three kinds of deep networks, which include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and Graph Neural Network (GNN). The CNNs extract appearance features from bounding box images, the RNNs encoder motion features from historical positions of trajectory. And then the GNNs aim to extract interaction features and optimize graph structure to associate the objects in different frames. In addition, we present a novel end-to-end training strategy for Deep Association Network and Human-Interaction Model. Our experimental results demonstrate performance of our method reaches the state-of-the-art on MOT15, MOT16 and DukeMTMCT datasets.

论文关键词:Multiple Object Tracking in the Wild, Human Interaction Model, Deep Association Network, Graph Neural Network

论文评审过程:

论文官网地址:https://doi.org/10.1007/s11263-021-01460-0