Related papers: Model Learning with Personalized Interpretability …

Learning a Formula of Interpretability to Learn Interpretable Formulas

Many risk-sensitive applications require Machine Learning (ML) models to be interpretable. Attempts to obtain interpretable models typically rely on tuning, by trial-and-error, hyper-parameters of model complexity that are only loosely…

Machine Learning · Computer Science 2020-05-29 Marco Virgolin , Andrea De Lorenzo , Eric Medvet , Francesca Randone

A Decision-Theoretic Approach for Model Interpretability in Bayesian Framework

A salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally,…

Machine Learning · Computer Science 2020-09-08 Homayun Afrabandpey , Tomi Peltola , Juho Piironen , Aki Vehtari , Samuel Kaski

Learning Interpretable Models with Causal Guarantees

Machine learning has shown much promise in helping improve the quality of medical, legal, and financial decision-making. In these applications, machine learning models must satisfy two important criteria: (i) they must be causal, since the…

Machine Learning · Computer Science 2021-10-12 Carolyn Kim , Osbert Bastani

Explainability for Machine Learning Models: From Data Adaptability to User Perception

This thesis explores the generation of local explanations for already deployed machine learning models, aiming to identify optimal conditions for producing meaningful explanations considering both data and user requirements. The primary…

Artificial Intelligence · Computer Science 2024-02-19 julien Delaunay

Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a…

Machine Learning · Computer Science 2021-05-07 Tong Wang , Jingyi Yang , Yunyi Li , Boxiang Wang

Interpretability of machine learning based prediction models in healthcare

There is a need of ensuring machine learning models that are interpretable. Higher interpretability of the model means easier comprehension and explanation of future predictions for end-users. Further, interpretable machine learning models…

Machine Learning · Computer Science 2020-08-17 Gregor Stiglic , Primoz Kocbek , Nino Fijacko , Marinka Zitnik , Katrien Verbert , Leona Cilar

The Definitions of Interpretability and Learning of Interpretable Models

As machine learning algorithms getting adopted in an ever-increasing number of applications, interpretation has emerged as a crucial desideratum. In this paper, we propose a mathematical definition for the human-interpretable model. In…

Machine Learning · Computer Science 2021-06-01 Weishen Pan , Changshui Zhang

Human-in-the-Loop Interpretability Prior

We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In…

Machine Learning · Statistics 2018-11-01 Isaac Lage , Andrew Slavin Ross , Been Kim , Samuel J. Gershman , Finale Doshi-Velez

Assessing the Local Interpretability of Machine Learning Models

The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on…

Machine Learning · Computer Science 2019-08-06 Dylan Slack , Sorelle A. Friedler , Carlos Scheidegger , Chitradeep Dutta Roy

HyPerAlign: Interpretable Personalized LLM Alignment via Hypothesis Generation

Alignment algorithms are widely used to align large language models (LLMs) to human users based on preference annotations. Typically these (often divergent) preferences are aggregated over a diverse set of users, resulting in fine-tuned…

Computation and Language · Computer Science 2025-05-21 Cristina Garbacea , Chenhao Tan

Interpretable by Design: Learning Predictors by Composing Interpretable Queries

There is a growing concern about typically opaque decision-making with high-performance machine learning algorithms. Providing an explanation of the reasoning process in domain-specific terms can be crucial for adoption in risk-sensitive…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 Aditya Chattopadhyay , Stewart Slocum , Benjamin D. Haeffele , Rene Vidal , Donald Geman

Interpretability with Accurate Small Models

Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy.…

Machine Learning · Computer Science 2020-07-02 Abhishek Ghose , Balaraman Ravindran

Interpretable Mixture of Experts

The need for reliable model explanations is prominent for many machine learning applications, particularly for tabular and time-series data as their use cases often involve high-stakes decision making. Towards this goal, we introduce a…

Machine Learning · Computer Science 2023-05-29 Aya Abdelsalam Ismail , Sercan Ö. Arik , Jinsung Yoon , Ankur Taly , Soheil Feizi , Tomas Pfister

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu

Manipulating and Measuring Model Interpretability

With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed,…

Artificial Intelligence · Computer Science 2021-08-17 Forough Poursabzi-Sangdeh , Daniel G. Goldstein , Jake M. Hofman , Jennifer Wortman Vaughan , Hanna Wallach

Model-Agnostic Interpretability of Machine Learning

Understanding why machine learning models behave the way they do empowers both system designers and end-users in many ways: in model selection, feature engineering, in order to trust and act upon the predictions, and in more intuitive user…

Machine Learning · Statistics 2016-06-20 Marco Tulio Ribeiro , Sameer Singh , Carlos Guestrin

On The Stability of Interpretable Models

Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or…

Machine Learning · Computer Science 2019-03-18 Riccardo Guidotti , Salvatore Ruggieri

Tracking Equivalent Mechanistic Interpretations Across Neural Networks

Mechanistic interpretability (MI) is an emerging framework for interpreting neural networks. Given a task and model, MI aims to discover a succinct algorithmic process, an interpretation, that explains the model's decision process on that…

Machine Learning · Computer Science 2026-04-01 Alan Sun , Mariya Toneva

The Price of Interpretability

When quantitative models are used to support decision-making on complex and important topics, understanding a model's ``reasoning'' can increase trust in its predictions, expose hidden biases, or reduce vulnerability to adversarial attacks.…

Machine Learning · Computer Science 2019-07-09 Dimitris Bertsimas , Arthur Delarue , Patrick Jaillet , Sebastien Martin

Interpretable Companions for Black-Box Models

We present an interpretable companion model for any pre-trained black-box classifiers. The idea is that for any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or…

Machine Learning · Statistics 2020-02-12 Danqing Pan , Tong Wang , Satoshi Hara