Discrete Fusion Adversarial Hashing for cross-modal retrieval

作者:

Highlights:

摘要

Deep cross-modal hashing enables a flexible and efficient way for large-scale cross-modal retrieval. Existing cross-modal retrieval methods based on deep hashing aim to learn the unified hashing representation for different modalities with the supervision of pair-wise correlation, and then encode the out-of-samples via modality-specific hashing network. However, the semantic gap and distribution shift were not considered enough, and the hashing codes cannot be unified as expected under different modalities. At the same time, hashing is still a discrete problem that has not been solved well in the deep neural network. Therefore, we propose the Discrete Fusion Adversarial Hashing (DFAH) network for cross-modal retrieval to address these issues. In DFAH, the Modality-Specific Feature Extractor is designed to capture image and text features with pair-wise supervision. Especially, the Fusion Learner is proposed to learn the unified hash code, which enhances the correlation of heterogeneous modalities via the embedding strategy. Meanwhile, the Modality Discriminator is designed to adapt to the distribution shift cooperating with the Modality-Specific Feature Extractor in an adversarial way. In addition, we design an efficient discrete optimization strategy to avoid the relaxing quantization errors in the deep neural framework. Finally, the experiment results and analysis on several popular datasets also show that DFAH outperforms the state-of-the-art methods for cross-modal retrieval.

论文关键词:Cross-modal retrieval,Deep hashing,Discrete optimization,Fusion learning,Adversarial learning

论文评审过程:Received 17 November 2021, Revised 19 July 2022, Accepted 20 July 2022, Available online 25 July 2022, Version of Record 5 August 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109503