Related papers: Debugging Tests for Model Explanations

Towards Benchmarking the Utility of Explanations for Model Debugging

Post-hoc explanation methods are an important class of approaches that help understand the rationale underlying a trained model's decision. But how useful are they for an end-user towards accomplishing a given task? In this vision paper, we…

Artificial Intelligence · Computer Science 2021-05-11 Maximilian Idahl , Lijun Lyu , Ujwal Gadiraju , Avishek Anand

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

We investigate whether three types of post hoc model explanations--feature attribution, concept activation, and training point ranking--are effective for detecting a model's reliance on spurious signals in the training data. Specifically,…

Machine Learning · Computer Science 2022-12-12 Julius Adebayo , Michael Muelly , Hal Abelson , Been Kim

A Categorisation of Post-hoc Explanations for Predictive Models

The ubiquity of machine learning based predictive models in modern society naturally leads people to ask how trustworthy those models are? In predictive modeling, it is quite common to induce a trade-off between accuracy and…

Machine Learning · Computer Science 2019-04-05 John Mitros , Brian Mac Namee

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

While deep neural network models offer unmatched classification performance, they are prone to learning spurious correlations in the data. Such dependencies on confounding information can be difficult to detect using performance metrics if…

Machine Learning · Computer Science 2023-08-09 Susu Sun , Lisa M. Koch , Christian F. Baumgartner

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) and are inherently based on the assumption that the explanations can be applied independently of how the models were trained.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Siddhartha Gairola , Moritz Böhle , Francesco Locatello , Bernt Schiele

Explanation-Based Human Debugging of NLP Models: A Survey

Debugging a machine learning model is hard since the bug usually involves the training data and the learning process. This becomes even harder for an opaque deep learning model if we have no clue about how the model actually works. In this…

Computation and Language · Computer Science 2021-12-14 Piyawat Lertvittayakumjorn , Francesca Toni

Don't Treat the Symptom, Find the Cause! Efficient Artificial-Intelligence Methods for (Interactive) Debugging

In the modern world, we are permanently using, leveraging, interacting with, and relying upon systems of ever higher sophistication, ranging from our cars, recommender systems in e-commerce, and networks when we go online, to integrated…

Artificial Intelligence · Computer Science 2023-06-23 Patrick Rodler

Model extraction from counterfactual explanations

Post-hoc explanation techniques refer to a posteriori methods that can be used to explain how black-box machine learning models produce their outcomes. Among post-hoc explanation techniques, counterfactual explanations are becoming one of…

Machine Learning · Computer Science 2020-09-07 Ulrich Aïvodji , Alexandre Bolot , Sébastien Gambs

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we…

Computation and Language · Computer Science 2019-12-06 Oana-Maria Camburu , Eleonora Giunchiglia , Jakob Foerster , Thomas Lukasiewicz , Phil Blunsom

Interpretation of Time-Series Deep Models: A Survey

Deep learning models developed for time-series associated tasks have become more widely researched nowadays. However, due to the unintuitive nature of time-series data, the interpretability problem -- where we understand what is under the…

Machine Learning · Computer Science 2023-05-25 Ziqi Zhao , Yucheng Shi , Shushan Wu , Fan Yang , Wenzhan Song , Ninghao Liu

Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Trust and ethical concerns due to the widespread deployment of opaque machine learning (ML) models motivating the need for reliable model explanations. Post-hoc model-agnostic explanation methods addresses this challenge by learning a…

Machine Learning · Computer Science 2026-03-18 Sumedha Chugh , Ranjitha Prasad , Nazreen Shah

Explaining black-box text classifiers for disease-treatment information extraction

Deep neural networks and other intricate Artificial Intelligence (AI) models have reached high levels of accuracy on many biomedical natural language processing tasks. However, their applicability in real-world use cases may be limited due…

Artificial Intelligence · Computer Science 2020-10-22 Milad Moradi , Matthias Samwald

Benchmarking Time-localized Explanations for Audio Classification Models

Most modern approaches for audio processing are opaque, in the sense that they do not provide an explanation for their decisions. For this reason, various methods have been proposed to explain the outputs generated by these models. Good…

Sound · Computer Science 2025-10-21 Cecilia Bolaños , Leonardo Pepino , Martin Meza , Luciana Ferrer

Model-Based Debugging using Multiple Abstract Models

This paper introduces an automatic debugging framework that relies on model-based reasoning techniques to locate faults in programs. In particular, model-based diagnosis, together with an abstract interpretation based conflict detection…

Software Engineering · Computer Science 2007-05-23 Wolfgang Mayer , Markus Stumptner

Explaining Language Models' Predictions with High-Impact Concepts

The emergence of large-scale pretrained language models has posed unprecedented challenges in deriving explanations of why the model has made some predictions. Stemmed from the compositional nature of languages, spurious correlations have…

Computation and Language · Computer Science 2023-05-04 Ruochen Zhao , Shafiq Joty , Yongjie Wang , Tan Wang

In Defence of Post-hoc Explainability

This position paper defends post-hoc explainability methods as legitimate tools for scientific knowledge production in machine learning. Addressing criticism of these methods' reliability and epistemic status, we develop a philosophical…

Machine Learning · Computer Science 2025-10-31 Nick Oh

Multicriteria interpretability driven Deep Learning

Deep Learning methods are renowned for their performances, yet their lack of interpretability prevents them from high-stakes contexts. Recent model agnostic methods address this problem by providing post-hoc interpretability methods by…

Machine Learning · Computer Science 2021-11-30 Marco Repetto

Training Machine Learning Models by Regularizing their Explanations

Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice.…

Machine Learning · Computer Science 2018-10-03 Andrew Slavin Ross

Counterfactual Training: Teaching Models Plausible and Actionable Explanations

We propose a novel training regime termed counterfactual training that leverages counterfactual explanations to increase the explanatory capacity of models. Counterfactual explanations have emerged as a popular post-hoc explanation method…

Machine Learning · Computer Science 2026-01-23 Patrick Altmeyer , Aleksander Buszydlik , Arie van Deursen , Cynthia C. S. Liem

Model-based Fault Classification for Automotive Software

Intensive testing using model-based approaches is the standard way of demonstrating the correctness of automotive software. Unfortunately, state-of-the-art techniques leave a crucial and labor intensive task to the test engineer:…

Software Engineering · Computer Science 2022-12-16 Mike Becker , Roland Meyer , Tobias Runge , Ina Schaefer , Sören van der Wall , Sebastian Wolff