Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning
作者:
Highlights:
• A deep image-to-video person re-identification pipeline with two modules is proposed to learn fine-grained and temporal invariant features.
• To address the appearance misalignment, a 3D-SAA module is designed to semantically align different human body parts in the 3D surface space.
• To address the modality misalignment, a CMIL module is developed to fuse two modalities with an interactive similarity comparison mechanism.
• A multi-branch aggregation network in 3D-SAA module is designed to weaken the influence of negligible body parts and backgrounds.
摘要
•A deep image-to-video person re-identification pipeline with two modules is proposed to learn fine-grained and temporal invariant features.•To address the appearance misalignment, a 3D-SAA module is designed to semantically align different human body parts in the 3D surface space.•To address the modality misalignment, a CMIL module is developed to fuse two modalities with an interactive similarity comparison mechanism.•A multi-branch aggregation network in 3D-SAA module is designed to weaken the influence of negligible body parts and backgrounds.
论文关键词:Person re-identification,Cross-modal learning,Appearance alignment
论文评审过程:Received 19 November 2020, Revised 20 June 2021, Accepted 9 September 2021, Available online 20 September 2021, Version of Record 24 September 2021.
论文官网地址:https://doi.org/10.1016/j.patcog.2021.108314