A multi-context representation approach with multi-task learning for object counting

作者:

Highlights:

摘要

Object counting is a fundamental while challenging computer vision task, as it requires the object appearance information as well as semantic understanding of the object. In this paper, we propose an end-to-end multi-context embedding deep network for object counting(MCENet), which observes the object counting task from the three different perspectives to count the number of vehicles in the traffic video frame, or to estimate the number of the pedestrian in the largely congested scene. The first sub-network of MCENet extracts the potential features for the appearance context and the semantic context from different-level layers. The two different-level features from the first sub-network are transferred into the two parallel and complementary sub-networks, which are used to model the appearance context and semantic context for final counting. And thus the multiple contexts are represented and embedded to assist the counting task. Extensive experimental evaluations are reported in this paper, using up to three different object counting benchmarks, which show the proposed approach achieves a competitive performance in all these heterogeneous scenarios.

论文关键词:00-01,99-00,Object counting,Multi-task learning,Multi-context representation,Appearance context,Multi-scale semantic

论文评审过程:Received 17 January 2020, Revised 3 April 2020, Accepted 16 April 2020, Available online 18 April 2020, Version of Record 24 April 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.105927