Related papers: Do Explanations Explain? Model Knows Best

Explanations can be manipulated and geometry is to blame

Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be…

Machine Learning · Statistics 2019-09-26 Ann-Kathrin Dombrowski , Maximilian Alber , Christopher J. Anders , Marcel Ackermann , Klaus-Robert Müller , Pan Kessel

The Intriguing Properties of Model Explanations

Linear approximations to the decision boundary of a complex model have become one of the most popular tools for interpreting predictions. In this paper, we study such linear explanations produced either post-hoc by a few recent methods or…

Machine Learning · Computer Science 2018-01-31 Maruan Al-Shedivat , Avinava Dubey , Eric P. Xing

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we…

Computation and Language · Computer Science 2019-12-06 Oana-Maria Camburu , Eleonora Giunchiglia , Jakob Foerster , Thomas Lukasiewicz , Phil Blunsom

Evaluating Explanations: How much do explanations from the teacher aid students?

While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of…

Computation and Language · Computer Science 2021-12-20 Danish Pruthi , Rachit Bansal , Bhuwan Dhingra , Livio Baldini Soares , Michael Collins , Zachary C. Lipton , Graham Neubig , William W. Cohen

Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

Explanations of model behavior are commonly evaluated via proxy properties weakly tied to the purposes explanations serve in practice. We contribute a decision theoretic framework that treats explanations as information signals valued by…

Artificial Intelligence · Computer Science 2026-02-24 Ziyang Guo , Berk Ustun , Jessica Hullman

A Formal Approach to Explainability

We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these…

Machine Learning · Computer Science 2020-01-16 Lior Wolf , Tomer Galanti , Tamir Hazan

Towards a Unified Framework for Evaluating Explanations

The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more…

Machine Learning · Computer Science 2024-07-16 Juan D. Pinto , Luc Paquette

Training Machine Learning Models by Regularizing their Explanations

Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice.…

Machine Learning · Computer Science 2018-10-03 Andrew Slavin Ross

When Can You Trust Your Explanations? A Robustness Analysis on Feature Importances

Recent legislative regulations have underlined the need for accountable and transparent artificial intelligence systems and have contributed to a growing interest in the Explainable Artificial Intelligence (XAI) field. Nonetheless, the lack…

Machine Learning · Computer Science 2025-10-14 Ilaria Vascotto , Alex Rodriguez , Alessandro Bonaita , Luca Bortolussi

Uncovering the Structure of Explanation Quality with Spectral Analysis

As machine learning models are increasingly considered for high-stakes domains, effective explanation methods are crucial to ensure that their prediction strategies are transparent to the user. Over the years, numerous metrics have been…

Machine Learning · Computer Science 2025-04-14 Johannes Maeß , Grégoire Montavon , Shinichi Nakajima , Klaus-Robert Müller , Thomas Schnake

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

Despite the recent progress in deep neural networks (DNNs), it remains challenging to explain the predictions made by DNNs. Existing explanation methods for DNNs mainly focus on post-hoc explanations where another explanatory model is…

Machine Learning · Computer Science 2024-01-04 Wei Qian , Chenxu Zhao , Yangyi Li , Fenglong Ma , Chao Zhang , Mengdi Huai

Aggregating explanation methods for stable and robust explainability

Despite a growing literature on explaining neural networks, no consensus has been reached on how to explain a neural network decision or how to evaluate an explanation. Our contributions in this paper are twofold. First, we investigate…

Machine Learning · Computer Science 2020-03-23 Laura Rieger , Lars Kai Hansen

Explainability Requires Interactivity

When explaining the decisions of deep neural networks, simple stories are tempting but dangerous. Especially in computer vision, the most popular explanation approaches give a false sense of comprehension to its users and provide an overly…

Machine Learning · Computer Science 2021-09-17 Matthias Kirchler , Martin Graf , Marius Kloft , Christoph Lippert

The Mythos of Model Interpretability

Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet…

Machine Learning · Computer Science 2017-03-07 Zachary C. Lipton

Feature Importance Depends on Properties of the Data: Towards Choosing the Correct Explanations for Your Data and Decision Trees based Models

In order to ensure the reliability of the explanations of machine learning models, it is crucial to establish their advantages and limits and in which case each of these methods outperform. However, the current understanding of when and how…

Machine Learning · Computer Science 2025-02-12 Célia Wafa Ayad , Thomas Bonnier , Benjamin Bosch , Sonali Parbhoo , Jesse Read

FICNN: A Framework for the Interpretation of Deep Convolutional Neural Networks

With the continue development of Convolutional Neural Networks (CNNs), there is a growing concern regarding representations that they encode internally. Analyzing these internal representations is referred to as model interpretation. While…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Hamed Behzadi-Khormouji , José Oramas

Measuring and improving the quality of visual explanations

The ability of to explain neural network decisions goes hand in hand with their safe deployment. Several methods have been proposed to highlight features important for a given network decision. However, there is no consensus on how to…

Computer Vision and Pattern Recognition · Computer Science 2020-03-23 Agnieszka Grabska-Barwińska

From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence

We take inspiration from the study of human explanation to inform the design and evaluation of interpretability methods in machine learning. First, we survey the literature on human explanation in philosophy, cognitive science, and the…

Artificial Intelligence · Computer Science 2021-09-21 David Alvarez-Melis , Harmanpreet Kaur , Hal Daumé , Hanna Wallach , Jennifer Wortman Vaughan

Improving Network Interpretability via Explanation Consistency Evaluation

While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some…

Computer Vision and Pattern Recognition · Computer Science 2024-08-09 Hefeng Wu , Hao Jiang , Keze Wang , Ziyi Tang , Xianghuan He , Liang Lin

Robustness of Explanation Methods for NLP Models

Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious…

Computation and Language · Computer Science 2022-06-27 Shriya Atmakuri , Tejas Chheda , Dinesh Kandula , Nishant Yadav , Taesung Lee , Hessel Tuinhof