Multi-task framework based on feature separation and reconstruction for cross-modal retrieval

作者:

Highlights:

• We introduce feature separation into traditional cross-modal retrieval task to deal with information asymmetry between different modalities, and use different loss functions to supervise different parts of the feature vectors.

• We introduce image and text reconstruction tasks for specific information of images and texts, forcing the accuracy of feature separation operation and improving the quality of specific information.

• We use the multi-task learning framework, integrate cross-modal retrieval tasks, image and text reconstruction tasks, and further improve the performance of cross-modal retrieval tasks through joint training.

• We conduct extensive experimentation on MS-COCO and Flickr30K datasets. Our empirical results demonstrate that feature separation and specific information reconstruction can significantly improve the baseline performance of cross-modal image-text retrieval.

摘要

•We introduce feature separation into traditional cross-modal retrieval task to deal with information asymmetry between different modalities, and use different loss functions to supervise different parts of the feature vectors.•We introduce image and text reconstruction tasks for specific information of images and texts, forcing the accuracy of feature separation operation and improving the quality of specific information.•We use the multi-task learning framework, integrate cross-modal retrieval tasks, image and text reconstruction tasks, and further improve the performance of cross-modal retrieval tasks through joint training.•We conduct extensive experimentation on MS-COCO and Flickr30K datasets. Our empirical results demonstrate that feature separation and specific information reconstruction can significantly improve the baseline performance of cross-modal image-text retrieval.

论文关键词:Cross-modal retrieval,Feature separation,Image reconstruction,Text reconstruction

论文评审过程:Received 1 July 2020, Revised 19 July 2021, Accepted 31 July 2021, Available online 2 August 2021, Version of Record 8 September 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108217