Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies
作者:
Highlights:
• Series of controlled user studies examining post-hoc example-based explanations for black-box deep learners doing classification (XAI).
• Black box AI models can be explained by “twinning” them with white-box models.
• Explanations were only found to impact people’s perception of errors.
• Explanations lead people to view errors as being “less incorrect”, but they do not improve trust.
• Trust in an AI model is undermined by increases in error-rates (from 3% error-levels onwards).
摘要
In this paper, we describe a post-hoc explanation-by-example approach to eXplainable AI (XAI), where a black-box, deep learning system is explained by reference to a more transparent, proxy model (in this situation a case-based reasoner), based on a feature-weighting analysis of the former that is used to find explanatory cases from the latter (as one instance of the so-called Twin Systems approach). A novel method (COLE-HP) for extracting the feature-weights from black-box models is demonstrated for a convolutional neural network (CNN) applied to the MNIST dataset; in which extracted feature-weights are used to find explanatory, nearest-neighbours for test instances. Three user studies are reported examining people's judgements of right and wrong classifications made by this XAI twin-system, in the presence/absence of explanations-by-example and different error-rates (from 3-60%). The judgements gathered include item-level evaluations of both correctness and reasonableness, and system-level evaluations of trust, satisfaction, correctness, and reasonableness. Several proposals are made about the user's mental model in these tasks and how it is impacted by explanations at an item- and system-level. The wider lessons from this work for XAI and its user studies are reviewed.
论文关键词:Explainable AI,Factual explanation,Trust,User testing,Convolutional neural network,Case-based reasoning,Deep learning,k-nearest neighbours
论文评审过程:Received 27 February 2020, Revised 22 December 2020, Accepted 21 January 2021, Available online 26 January 2021, Version of Record 29 January 2021.
论文官网地址:https://doi.org/10.1016/j.artint.2021.103459