Deep discriminative image feature learning for cross-modal semantics understanding

作者：

Highlights：

•

摘要

Deep hashing image feature learning methods has attracted attentions of cross-modal semantics understanding researchers due to its low storage costs and efficient query speed. Usually heterogeneous cross-modal data are embedded in semantic space, then they may be converted to their own corresponding binary hash code through the learning hash function. But when mapping heterogeneous data to common Hamming space, some works ignore joint cross correlation that contributes to interactively explore the latent semantic information between different modalities, resulting in sub-optimal feature. To address the above issues, we present a novel deep discriminative feature learning methods for cross-modal semantics understanding named Deep Discriminant Semantic Joint Hashing(DDSJH). In order to maximize joint cross correlation, we use mutual information which contribute to semantics understanding. Features in semantic space are exchanged with the pairwise features to calculate loss between semantic space and Hamming space. Thus, the corresponding information in cross-modal data are collaboratively utilized to realize exploration of underlying mutual joint semantic correlation. Hash codes of similar categories are as close as possible, but it should be considered that hash codes of different categories data should be as discriminative as possible. So we harness linear classifier to learn discriminative hash code. Extensive experiments on two image–text cross-modal datasets show that our proposed approach achieves better accuracy than several state-of-the-art methods.

论文关键词：Joint cross correlation,Cross-modal discriminative semantics,Deep hashing image feature learning,Linear classifier

论文评审过程：Received 3 October 2020, Revised 28 December 2020, Accepted 24 January 2021, Available online 2 February 2021, Version of Record 6 February 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.106812