Related papers: Regularizing Black-box Models for Improved Interpr…

Regularizing Black-box Models for Improved Interpretability (HILL 2019 Version)

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, which lack guarantees about their…

Machine Learning · Computer Science 2019-06-05 Gregory Plumb , Maruan Al-Shedivat , Eric Xing , Ameet Talwalkar

A Categorisation of Post-hoc Explanations for Predictive Models

The ubiquity of machine learning based predictive models in modern society naturally leads people to ask how trustworthy those models are? In predictive modeling, it is quite common to induce a trade-off between accuracy and…

Machine Learning · Computer Science 2019-04-05 John Mitros , Brian Mac Namee

An Evaluation of the Human-Interpretability of Explanation

Recent years have seen a boom in interest in machine learning systems that can provide a human-understandable rationale for their predictions or decisions. However, exactly what kinds of explanation are truly human-interpretable remains…

Machine Learning · Computer Science 2019-08-30 Isaac Lage , Emily Chen , Jeffrey He , Menaka Narayanan , Been Kim , Sam Gershman , Finale Doshi-Velez

Explanation-based Training with Differentiable Insertion/Deletion Metric-aware Regularizers

The quality of explanations for the predictions made by complex machine learning predictors is often measured using insertion and deletion metrics, which assess the faithfulness of the explanations, i.e., how accurately the explanations…

Machine Learning · Computer Science 2024-03-13 Yuya Yoshikawa , Tomoharu Iwata

Interactively Providing Explanations for Transformer Language Models

Transformer language models are state of the art in a multitude of NLP tasks. Despite these successes, their opaqueness remains problematic. Recent methods aiming to provide interpretability and explainability to black-box models primarily…

Computation and Language · Computer Science 2022-03-14 Felix Friedrich , Patrick Schramowski , Christopher Tauchmann , Kristian Kersting

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we…

Computation and Language · Computer Science 2019-12-06 Oana-Maria Camburu , Eleonora Giunchiglia , Jakob Foerster , Thomas Lukasiewicz , Phil Blunsom

Refining Language Models with Compositional Explanations

Pre-trained language models have been successful on text classification tasks, but are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new domain. Prior work reveals such…

Computation and Language · Computer Science 2022-01-03 Huihan Yao , Ying Chen , Qinyuan Ye , Xisen Jin , Xiang Ren

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable…

Machine Learning · Computer Science 2022-06-03 Aparna Balagopalan , Haoran Zhang , Kimia Hamidieh , Thomas Hartvigsen , Frank Rudzicz , Marzyeh Ghassemi

Post-hoc Interpretability for Neural NLP: A Survey

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for…

Computation and Language · Computer Science 2023-11-29 Andreas Madsen , Siva Reddy , Sarath Chandar

Towards Robust Interpretability with Self-Explaining Neural Networks

Most recent work on interpretability of complex machine learning models has focused on estimating $\textit{a posteriori}$ explanations for previously trained models around specific predictions. $\textit{Self-explaining}$ models where…

Machine Learning · Computer Science 2018-12-05 David Alvarez-Melis , Tommi S. Jaakkola

A Framework for Evaluating Post Hoc Feature-Additive Explainers

Many applications of data-driven models demand transparency of decisions, especially in health care, criminal justice, and other high-stakes environments. Modern trends in machine learning research have led to algorithms that are…

Machine Learning · Computer Science 2022-05-09 Zachariah Carmichael , Walter J. Scheirer

An interpretable neural network model through piecewise linear approximation

Most existing interpretable methods explain a black-box model in a post-hoc manner, which uses simpler models or data analysis techniques to interpret the predictions after the model is learned. However, they (a) may derive contradictory…

Machine Learning · Computer Science 2020-01-22 Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng

Lifting Interpretability-Performance Trade-off via Automated Feature Engineering

Complex black-box predictive models may have high performance, but lack of interpretability causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, achieving satisfactory accuracy of…

Machine Learning · Computer Science 2020-02-12 Alicja Gosiewska , Przemyslaw Biecek

Training Deep Models to be Explained with Fewer Examples

Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based…

Machine Learning · Statistics 2021-12-08 Tomoharu Iwata , Yuya Yoshikawa

Optimal Explanations of Linear Models

When predictive models are used to support complex and important decisions, the ability to explain a model's reasoning can increase trust, expose hidden biases, and reduce vulnerability to adversarial attacks. However, attempts at…

Machine Learning · Computer Science 2019-07-11 Dimitris Bertsimas , Arthur Delarue , Patrick Jaillet , Sebastien Martin

An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner

The interpretability of model has become one of the obstacles to its wide application in the high-stake fields. The usual way to obtain interpretability is to build a black-box first and then explain it using the post-hoc methods. However,…

Machine Learning · Computer Science 2023-04-04 Zihao Chen , Xiaomeng Wang , Yuanjiang Huang , Tao Jia

Interpretability Needs a New Paradigm

Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be…

Machine Learning · Computer Science 2024-11-14 Andreas Madsen , Himabindu Lakkaraju , Siva Reddy , Sarath Chandar

Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions

From self-driving vehicles and back-flipping robots to virtual assistants who book our next appointment at the hair salon or at that restaurant for dinner - machine learning systems are becoming increasingly ubiquitous. The main reason for…

Machine Learning · Computer Science 2018-08-16 Milo Honegger

Critical Empirical Study on Black-box Explanations in AI

This paper provides empirical concerns about post-hoc explanations of black-box ML models, one of the major trends in AI explainability (XAI), by showing its lack of interpretability and societal consequences. Using a representative…

Human-Computer Interaction · Computer Science 2021-10-01 Jean-Marie John-Mathews

ExSum: From Local Explanations to Model Understanding

Interpretability methods are developed to understand the working mechanisms of black-box models, which is crucial to their responsible deployment. Fulfilling this goal requires both that the explanations generated by these methods are…

Computation and Language · Computer Science 2022-05-03 Yilun Zhou , Marco Tulio Ribeiro , Julie Shah