Transformer models for enhancing AttnGAN based text to image generation

作者:

Highlights:

• A new variant of AttnGAN model is proposed for TTI synthesis.

• The proposed AttnGANTRANS model uses a Transformer based text encoder.

• Transformers like BERT, GPT2 and XLNet are employed and analysed.

• Experiments validate that AttnGANTRANS outperforms state-of-the art methods.

• Over AttnGAN, AttnGANTRANS has a 49.9% lower FID and 27.23% higher inception score.

摘要

Highlights•A new variant of AttnGAN model is proposed for TTI synthesis.•The proposed AttnGANTRANS model uses a Transformer based text encoder.•Transformers like BERT, GPT2 and XLNet are employed and analysed.•Experiments validate that AttnGANTRANS outperforms state-of-the art methods.•Over AttnGAN, AttnGANTRANS has a 49.9% lower FID and 27.23% higher inception score.

论文关键词:Generative Adversarial Networks (GANs),Natural Language Processing (NLP),Text to image synthesis,Transformers,Attention mechanism

论文评审过程:Received 2 April 2021, Revised 22 July 2021, Accepted 13 August 2021, Available online 25 August 2021, Version of Record 7 September 2021.

论文官网地址:https://doi.org/10.1016/j.imavis.2021.104284