A Confusion Matrix for Evaluating Feature Attribution Methods

The increasing use of deep learning models in critical areas of computer vision and the consequent need for insights into model behaviour have led to the development of numerous feature attribution methods. However, these attributions must be both meaningful and plausible to end-users, which is not always the case. Recent research has emphasized the importance of faithfulness in attributions, as plausibility without faithfulness can result in misleading explanations and incorrect decisions. In this work, we propose a novel approach to evaluate the faithfulness of feature attribution methods by constructing an'Attribution Confusion Matrix', which allows us to leverage a wide range of existing metrics from the traditional confusion matrix. This approach effectively introduces multiple evaluation measures for faithfulness in feature attribution methods in a unified and consistent framework. We demonstrate the effectiveness of our approach on various datasets, attribution methods, and models, emphasizing the importance of faithfulness in generating plausible and reliable explanations while also illustrating the distinct behaviour of different feature attribution methods.

keywords: Explainable AI, feature attributions, explainable Computer Vision