Probabilistic framework for solving visual dialog
作者:
Highlights:
• Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.
• Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.
• Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.
摘要
•Probabilistic Representation Module: Through this module, we obtain probabilistic representations for image, question, and history of the conversation using Bayesian CNN and Bayesian RNN modules.•Diverse Latent Answer Generation Module: In this module, we use a variational autoencoder based latent representation that allows us to obtain latent representations from which we can sample answers.•Uncertainty Representation Module: In this module, we propose a Reverse Uncertainty based Attention Map (RUAM) method by using Bayesian deep learning methods that allows us to minimize data uncertainty and model uncertainty.
论文关键词:CNN,LSTM,Uncertainty,Aleatoric uncertainty,Epistemic uncertainty vision and language,Visual dialog,VQA,Answer generation,Question generation,Bayesian deep learning
论文评审过程:Received 11 September 2019, Revised 26 April 2020, Accepted 9 August 2020, Available online 15 August 2020, Version of Record 1 November 2020.
论文官网地址:https://doi.org/10.1016/j.patcog.2020.107586