Multi-attention based semantic deep hashing for cross-modal retrieval

作者:Liping Zhu, Gangyi Tian, Bingyao Wang, Wenjie Wang, Di Zhang, Chengyang Li

摘要

Cross-modal hashing is an efficient method to retrieve cross domain data. Most previous methods focused on measuring the discrepancy between intro-modality and inter-modality. However, recent researches show that semantic information is vital for cross-modal retrieval as well. As for human vision system, people establish multi-modality connections by utilizing attention mechanism with semantic information. Most of the previous methods, which are attention-based, only simply apply single modality attention, ignoring the effectiveness of multi-attention. Multi-attention is consisted of features from different semantic representation space. For better filling the gap of semantic connection among modalities, it could guides the output features to achieve alignment via utilizing attention mechanism. From this perspective, we propose a new cross-modal hashing method in this paper: 1) We design a multi-attention block to extract features effected by multi-attention. 2) We propose a correlative loss function to optimize the multi-attention matrix generated by the block, and also make hash code consistent and semantically correlated in subsequent generation. Experiments on three challenging benchmarks demonstrate the effectiveness of our method in the application of cross-modal retrieval.

论文关键词:Cross-modal retrieval, Deep hashing, Attention mechanism, Semantic feature alignment

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-020-02137-w