Explanation vs. attention: A two-player game to obtain attention for VQA and visual dialog
作者:
Highlights:
• We propose a means for obtaining surrogate supervision for obtaining better attention maps based on the visual explanation in the form of Grad-CAM results.
• This method performs better than other forms of surrogate supervision such as RISE.
• The surrogate supervision can be best used through a variant of adversarial learning to obtain attention maps that correlate well with the visual explanation.
• Further, we observe that this performs better as against other means of supervision, such as MMD or CORAL losses.
• We provide various comparisons and results to show that we obtain better attention maps that correlate well with human attention maps and outperform other techniques for VQA.
• Further, we show that obtaining better attention maps also aids in obtaining better accuracies while solving for VQA. A detailed empirical analysis for the same is provided.
摘要
•We propose a means for obtaining surrogate supervision for obtaining better attention maps based on the visual explanation in the form of Grad-CAM results.•This method performs better than other forms of surrogate supervision such as RISE.•The surrogate supervision can be best used through a variant of adversarial learning to obtain attention maps that correlate well with the visual explanation.•Further, we observe that this performs better as against other means of supervision, such as MMD or CORAL losses.•We provide various comparisons and results to show that we obtain better attention maps that correlate well with human attention maps and outperform other techniques for VQA.•Further, we show that obtaining better attention maps also aids in obtaining better accuracies while solving for VQA. A detailed empirical analysis for the same is provided.
论文关键词:CNN,LSTM,Explanation,Attention,Grad-CAM,MMD,CORAL,GAN,VQA,Visual Dialog,Deep learning
论文评审过程:Received 13 March 2022, Accepted 10 July 2022, Available online 23 July 2022, Version of Record 28 July 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.108898