Deep code operation network for multi-label image retrieval

作者:

Highlights:

摘要

Deep hashing methods have been extensively studied for large-scale image search and achieved promising results in recent years. However, there are two major limitations of previous deep hashing methods for multi-label image retrieval: the first one concerns the flexibility for users to express their query intention (so-called the intention gap), and the second one concerns the exploitation of rich similarity structures of the semantic space (so-called the semantic gap). To address these issues, we propose a novel Deep Code Operation Network (CoNet), in which a user is allowed to simultaneously present multiple images instead of a single one as his/her query, and then the system triggers a series of code operators to extract the hidden relations among them. In this way, a set of new queries are automatically constructed to cover users’ real complex query intention, without the need of explicitly stating them. The CoNet is trained with a newly proposed margin-adaptive triplet loss function, which effectively encourages the system to incorporate the hierarchical similarity structures of the semantic space into the learning procedure of the code operations. The whole system has an end-to-end differentiable architecture, equipped with an adversarial mechanism to further improve the quality of the final intention representation. Experimental results on four multi-label image datasets demonstrate that our method significantly improves the state-of-the-art in performing complex multi-label retrieval tasks with multiple query images.

论文关键词:

论文评审过程:Received 28 July 2019, Revised 25 January 2020, Accepted 28 January 2020, Available online 4 February 2020, Version of Record 7 February 2020.

论文官网地址:https://doi.org/10.1016/j.cviu.2020.102916