Cross-modal discriminant adversarial network

作者:

Highlights:

• In this paper, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN) to learn a latent discriminant space for cross-modal data, which is with a novel network structure and a novel learning mechanism (CDM). In brief, CDM projects the generated features of all modalities into a latent common space and gives the positive/negative feedback to adversarial learning. Therefore, our method could reduce the modality discrepancy, while preserving the discriminative information into the common space.

• To improve our CDM, a novel objective function is presented to learn the common space in which the within-class samples should be compacted and the betweenclass samples should be scattered. Furthermore, the transformations of the CDM can be analytically solved from the generated features, thus escaping from the trap of local minimal.

• To avoid the trivial solutions of directly optimizing the CDM objective function, a novel logarithmic eigenvalue-based loss is proposed. Another advantage of the proposed loss is that it could push as much discrimination as possible into all latent directions of CDM transformations instead of only the dominant ones. Preprint submitted

摘要

•In this paper, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN) to learn a latent discriminant space for cross-modal data, which is with a novel network structure and a novel learning mechanism (CDM). In brief, CDM projects the generated features of all modalities into a latent common space and gives the positive/negative feedback to adversarial learning. Therefore, our method could reduce the modality discrepancy, while preserving the discriminative information into the common space.•To improve our CDM, a novel objective function is presented to learn the common space in which the within-class samples should be compacted and the betweenclass samples should be scattered. Furthermore, the transformations of the CDM can be analytically solved from the generated features, thus escaping from the trap of local minimal.•To avoid the trivial solutions of directly optimizing the CDM objective function, a novel logarithmic eigenvalue-based loss is proposed. Another advantage of the proposed loss is that it could push as much discrimination as possible into all latent directions of CDM transformations instead of only the dominant ones. Preprint submitted

论文关键词:Adversarial learning,Cross-modal representation learning,Cross-modal retrieval,Discriminant adversarial network,Cross-modal discriminant mechanism,Latent common space

论文评审过程:Received 3 June 2020, Revised 2 October 2020, Accepted 29 October 2020, Available online 5 November 2020, Version of Record 30 January 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107734