Related papers: Measuring algorithmic interpretability: A human-le…
Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this…
Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently…
Existing approaches for generating human-aware agent behaviors have considered different measures of interpretability in isolation. Further, these measures have been studied under differing assumptions, thus precluding the possibility of…
We take inspiration from the study of human explanation to inform the design and evaluation of interpretability methods in machine learning. First, we survey the literature on human explanation in philosophy, cognitive science, and the…
The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on…
We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model,…
To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive…
Machine learning (ML) models have been applied to a wide range of natural language processing (NLP) tasks in recent years. In addition to making accurate decisions, the necessity of understanding how models make their decisions has become…
In recent years, machine learning researchers have focused on methods to construct flexible and interpretable prediction models. However, an interpretability evaluation, a relationship between generalization performance and an…
Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications.…
This paper outlines a theoretical framework using which different automatic metrics can be designed for evaluation of Machine Translation systems. It introduces the concept of {\em cognitive ease} which depends on {\em adequacy} and {\em…
Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be…
Fairness and interpretability play an important role in the adoption of decision-making algorithms across many application domains. These requirements are intended to avoid undesirable group differences and to alleviate concerns related to…
When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks.…
Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model's decision process on that…
Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated…
Current machine learning models are evaluated through behavioral snapshots, with benchmark accuracies, win rates and outcome-based metrics. Model explanations and evaluations, however, are fundamentally intertwined: understanding why a…
As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the human-interpretable model. In…
We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In…
Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…