Counterfactual explanation based on gradual construction for deep networks

作者:

Highlights:

• We propose a counterfactual explanation method that deduces not only the important features of an input space but also how those features should be modified to classify input as a target class.

• By observing the feature variation among different classes, we show that the characteristic that deep networks have learned from a training dataset can be analyzed for various domains.

• The proposed method is based on gradual construction to search out the most important features for explanation and we reveal that considering the logit distribution of a training dataset is crucial to generate human-friendly explanations.

摘要

•We propose a counterfactual explanation method that deduces not only the important features of an input space but also how those features should be modified to classify input as a target class.•By observing the feature variation among different classes, we show that the characteristic that deep networks have learned from a training dataset can be analyzed for various domains.•The proposed method is based on gradual construction to search out the most important features for explanation and we reveal that considering the logit distribution of a training dataset is crucial to generate human-friendly explanations.

论文关键词:Explainable AI,Counterfactual explanation,Interpretability,Model-agnostics,Generative model

论文评审过程:Received 28 July 2020, Revised 19 August 2021, Accepted 7 August 2022, Available online 8 August 2022, Version of Record 16 August 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108958