Probabilistic framework for solving visual dialog

Highlights：

• Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.

• Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.

• Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.

摘要

•Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.•Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.•Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.

论文关键词：CNN,LSTM,Uncertainty,Aleatoric uncertainty,Epistemic uncertainty vision and language,Visual dialog,VQA,Answer generation,Question generation,Bayesian deep learning

论文评审过程：Received 11 September 2019, Revised 26 April 2020, Accepted 9 August 2020, Available online 15 August 2020, Version of Record 1 November 2020.