Revisiting image captioning via maximum discrepancy competition

作者:

Highlights:

• We propose a new model comparison method without an unaffordable large-scale subjective annotation experiment.

• A new similarity function named NGSM is proposed as a semantic distance measure to model discrepancy of captions. With this NGSM, the informative images can be selected effectively from an arbitrary large-scale raw image dataset.

• We demonstrate quantitative results of the generalization ability of the competing ICMs and provide detailed analysis about the key factor of improving the generalization ability of ICMs.

摘要

•We propose a new model comparison method without an unaffordable large-scale subjective annotation experiment.•A new similarity function named NGSM is proposed as a semantic distance measure to model discrepancy of captions. With this NGSM, the informative images can be selected effectively from an arbitrary large-scale raw image dataset.•We demonstrate quantitative results of the generalization ability of the competing ICMs and provide detailed analysis about the key factor of improving the generalization ability of ICMs.

论文关键词:Image captioning,Model comparison,Attention mechanism

论文评审过程:Received 19 January 2021, Revised 23 August 2021, Accepted 29 September 2021, Available online 1 October 2021, Version of Record 8 October 2021.

论文官网地址:https://doi.org/10.1016/j.patcog.2021.108358