Related papers: Measuring algorithmic interpretability: A human-le…

Quantifying Interpretability and Trust in Machine Learning Systems

Decisions by Machine Learning (ML) models have become ubiquitous. Trusting these decisions requires understanding how algorithms take them. Hence interpretability methods for ML are an active focus of research. A central problem in this…

Machine Learning · Computer Science 2019-01-25 Philipp Schmidt , Felix Biessmann

On quantitative aspects of model interpretability

Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently…

Machine Learning · Computer Science 2020-07-16 An-phi Nguyen , María Rodríguez Martínez

A Unifying Bayesian Formulation of Measures of Interpretability in Human-AI

Existing approaches for generating human-aware agent behaviors have considered different measures of interpretability in isolation. Further, these measures have been studied under differing assumptions, thus precluding the possibility of…

Artificial Intelligence · Computer Science 2021-04-23 Sarath Sreedharan , Anagha Kulkarni , David E. Smith , Subbarao Kambhampati

From Human Explanation to Model Interpretability: A Framework Based on Weight of Evidence

We take inspiration from the study of human explanation to inform the design and evaluation of interpretability methods in machine learning. First, we survey the literature on human explanation in philosophy, cognitive science, and the…

Artificial Intelligence · Computer Science 2021-09-21 David Alvarez-Melis , Harmanpreet Kaur , Hal Daumé , Hanna Wallach , Jennifer Wortman Vaughan

Assessing the Local Interpretability of Machine Learning Models

The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on…

Machine Learning · Computer Science 2019-08-06 Dylan Slack , Sorelle A. Friedler , Carlos Scheidegger , Chitradeep Dutta Roy

A Formal Framework to Characterize Interpretability of Procedures

We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model,…

Artificial Intelligence · Computer Science 2017-07-14 Amit Dhurandhar , Vijay Iyengar , Ronny Luss , Karthikeyan Shanmugam

A Framework to Learn with Interpretation

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive…

Machine Learning · Computer Science 2022-02-24 Jayneel Parekh , Pavlo Mozharovskyi , Florence d'Alché-Buc

Interpretation Quality Score for Measuring the Quality of interpretability methods

Machine learning (ML) models have been applied to a wide range of natural language processing (NLP) tasks in recent years. In addition to making accurate decisions, the necessity of understanding how models make their decisions has become…

Computation and Language · Computer Science 2023-11-02 Sean Xie , Soroush Vosoughi , Saeed Hassanpour

How to improve the interpretability of kernel learning

In recent years, machine learning researchers have focused on methods to construct flexible and interpretable prediction models. However, an interpretability evaluation, a relationship between generalization performance and an…

Machine Learning · Computer Science 2019-10-08 Jinwei Zhao , Qizhou Wang , Yufei Wang , Yu Liu , Zhenghao Shi , Xinhong Hei

The Promise and Peril of Human Evaluation for Model Interpretability

Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications.…

Artificial Intelligence · Computer Science 2019-10-31 Bernease Herman

Evaluating MT Systems: A Theoretical Framework

This paper outlines a theoretical framework using which different automatic metrics can be designed for evaluation of Machine Translation systems. It introduces the concept of {\em cognitive ease} which depends on {\em adequacy} and {\em…

Computation and Language · Computer Science 2022-02-14 Rajeev Sangal

From Mechanistic to Compositional Interpretability

Mechanistic interpretability aims to explain neural model behaviour by reverse-engineering learned computational structure into human-understandable components. Without a formal framework, however, mechanistic explanations cannot be…

Machine Learning · Computer Science 2026-05-12 Ward Gauderis , Thomas Dooms , Steven T. Holmer , Kola Ayonrinde , Geraint A. Wiggins

Fairness-Aware and Interpretable Policy Learning

Fairness and interpretability play an important role in the adoption of decision-making algorithms across many application domains. These requirements are intended to avoid undesirable group differences and to alleviate concerns related to…

Econometrics · Economics 2025-09-16 Nora Bearth , Michael Lechner , Jana Mareckova , Fabian Muny

The Price of Interpretability

When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks.…

Machine Learning · Computer Science 2019-07-09 Dimitris Bertsimas , Arthur Delarue , Patrick Jaillet , Sebastien Martin

Tracking Equivalent Mechanistic Interpretations Across Neural Networks

Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model's decision process on that…

Machine Learning · Computer Science 2026-04-01 Alan Sun , Mariya Toneva

An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability

Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated…

Software Engineering · Computer Science 2020-07-27 Marvin Muñoz Barón , Marvin Wyrich , Stefan Wagner

Rigorous Interpretation Is a Form of Evaluation

Current machine learning models are evaluated through behavioral snapshots, with benchmark accuracies, win rates and outcome-based metrics. Model explanations and evaluations, however, are fundamentally intertwined: understanding why a…

Computers and Society · Computer Science 2026-05-08 Isabelle Lee , Emmy Liu , Cathy Jiao , Brihi Joshi , Dani Yogatama , Fazl Barez , Michael Saxon

The Definitions of Interpretability and Learning of Interpretable Models

As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the human-interpretable model. In…

Machine Learning · Computer Science 2021-06-01 Weishen Pan , Changshui Zhang

Human-in-the-Loop Interpretability Prior

We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In…

Machine Learning · Statistics 2018-11-01 Isaac Lage , Andrew Slavin Ross , Been Kim , Samuel J. Gershman , Finale Doshi-Velez

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…

Machine Learning · Computer Science 2022-07-18 Xuhong Li , Haoyi Xiong , Xingjian Li , Xuanyu Wu , Xiao Zhang , Ji Liu , Jiang Bian , Dejing Dou