Probabilistic framework for solving visual dialog

作者:

Highlights:

• Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.

• Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.

• Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.

摘要

•Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.•Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.•Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.

论文关键词:CNN,LSTM,Uncertainty,Aleatoric uncertainty,Epistemic uncertainty vision and language,Visual dialog,VQA,Answer generation,Question generation,Bayesian deep learning

论文评审过程:Received 11 September 2019, Revised 26 April 2020, Accepted 9 August 2020, Available online 15 August 2020, Version of Record 1 November 2020.

论文官网地址:https://doi.org/10.1016/j.patcog.2020.107586