Unsupervised text-to-image synthesis

作者:

Highlights:

• We make the first attempt to train one text-to-image synthesis model in an unsupervised manner.

• A novel visual concept discrimination loss is proposed to train both generator and discriminator, which not only encourages the generated image expressing the local visual concepts but also ensures the noisy visual concepts contained in the pseudo sentence being suppressed.

• One global semantic consistency loss is used to ensure that the generated image semantically corresponds to the input real sentence.

• Our proposed model can generate pleasant image for one given sentence, with no reliance on any image-text pair data, which even outperforms some text-to-image synthesis models trained in the supervised manner.

摘要

•We make the first attempt to train one text-to-image synthesis model in an unsupervised manner.•A novel visual concept discrimination loss is proposed to train both generator and discriminator, which not only encourages the generated image expressing the local visual concepts but also ensures the noisy visual concepts contained in the pseudo sentence being suppressed.•One global semantic consistency loss is used to ensure that the generated image semantically corresponds to the input real sentence.•Our proposed model can generate pleasant image for one given sentence, with no reliance on any image-text pair data, which even outperforms some text-to-image synthesis models trained in the supervised manner.

论文关键词:Text-to-image synthesis,Generative adversarial network (GAN),Unsupervised training

论文评审过程:Received 16 February 2020, Revised 20 June 2020, Accepted 4 August 2020, Available online 20 August 2020, Version of Record 1 November 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107573