STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification

作者:

Highlights:

• We propose a Spatio Temporal Cross Attention (STCA) network to generate cross guided attention for video re-identification.

• The proposed STCA adopts both the 2D and 3D-CNNs to capture the common salient features consistent throughout space and time.

• The generated attention is used for gating 2D-CNN that enhances its mean of fine-grained recognition to address misalignment.

• Optimizing STCA using cosine distance for hard triplet mining leads to faster convergence and better recognition accuracy

摘要

•We propose a Spatio Temporal Cross Attention (STCA) network to generate cross guided attention for video re-identification.•The proposed STCA adopts both the 2D and 3D-CNNs to capture the common salient features consistent throughout space and time.•The generated attention is used for gating 2D-CNN that enhances its mean of fine-grained recognition to address misalignment.•Optimizing STCA using cosine distance for hard triplet mining leads to faster convergence and better recognition accuracy

论文关键词:Re-identification,Deep learning,3D-CNNs,Cross attention

论文评审过程:Received 1 February 2022, Revised 13 April 2022, Accepted 9 May 2022, Available online 17 May 2022, Version of Record 26 May 2022.

论文官网地址:https://doi.org/10.1016/j.imavis.2022.104474