Related papers: Interpretability Can Be Actionable

A Survey on Neural Network Interpretability

Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is also related to many ethical problems, e.g.,…

Machine Learning · Computer Science 2022-02-01 Yu Zhang , Peter Tiňo , Aleš Leonardis , Ke Tang

Foundations of Interpretable Models

We argue that existing definitions of interpretability are not actionable in that they fail to inform users about general, sound, and robust interpretable model design. This makes current interpretability research fundamentally ill-posed.…

Machine Learning · Computer Science 2025-08-04 Pietro Barbiero , Mateo Espinosa Zarlenga , Alberto Termine , Mateja Jamnik , Giuseppe Marra

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…

Machine Learning · Computer Science 2022-07-18 Xuhong Li , Haoyi Xiong , Xingjian Li , Xuanyu Wu , Xiao Zhang , Ji Liu , Jiang Bian , Dejing Dou

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu

Actionable Interpretability Must Be Defined in Terms of Symmetries

This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit…

Artificial Intelligence · Computer Science 2026-01-30 Pietro Barbiero , Mateo Espinosa Zarlenga , Francesco Giannini , Alberto Termine , Filippo Bonchi , Mateja Jamnik , Giuseppe Marra

A Comprehensive Survey on Self-Interpretable Neural Networks

Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides…

Machine Learning · Computer Science 2025-11-21 Yang Ji , Ying Sun , Yuting Zhang , Zhigaoyuan Wang , Yuanxin Zhuang , Zheng Gong , Dazhong Shen , Chuan Qin , Hengshu Zhu , Hui Xiong

The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis

Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making…

Machine Learning · Computer Science 2025-10-01 Aaron Mueller , Jannik Brinkmann , Millicent Li , Samuel Marks , Koyena Pal , Nikhil Prakash , Can Rager , Aruna Sankaranarayanan , Arnab Sen Sharma , Jiuding Sun , Eric Todd , David Bau , Yonatan Belinkov

Towards falsifiable interpretability research

Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize…

Computers and Society · Computer Science 2020-10-26 Matthew L. Leavitt , Ari Morcos

Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications

With the broader and highly successful usage of machine learning in industry and the sciences, there has been a growing demand for Explainable AI. Interpretability and explanation methods for gaining a better understanding about the problem…

Machine Learning · Computer Science 2021-02-26 Wojciech Samek , Grégoire Montavon , Sebastian Lapuschkin , Christopher J. Anders , Klaus-Robert Müller

Open Problems in Mechanistic Interpretability

Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater…

Machine Learning · Computer Science 2025-01-29 Lee Sharkey , Bilal Chughtai , Joshua Batson , Jack Lindsey , Jeff Wu , Lucius Bushnaq , Nicholas Goldowsky-Dill , Stefan Heimersheim , Alejandro Ortega , Joseph Bloom , Stella Biderman , Adria Garriga-Alonso , Arthur Conmy , Neel Nanda , Jessica Rumbelow , Martin Wattenberg , Nandi Schoots , Joseph Miller , Eric J. Michaud , Stephen Casper , Max Tegmark , William Saunders , David Bau , Eric Todd , Atticus Geiger , Mor Geva , Jesse Hoogland , Daniel Murfet , Tom McGrath

The Mythos of Model Interpretability

Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet…

Machine Learning · Computer Science 2017-03-07 Zachary C. Lipton

A Formal Framework to Characterize Interpretability of Procedures

We provide a novel notion of what it means to be interpretable, looking past the usual association with human understanding. Our key insight is that interpretability is not an absolute concept and so we define it relative to a target model,…

Artificial Intelligence · Computer Science 2017-07-14 Amit Dhurandhar , Vijay Iyengar , Ronny Luss , Karthikeyan Shanmugam

On the Relationship Between Interpretability and Explainability in Machine Learning

Interpretability and explainability have gained more and more attention in the field of machine learning as they are crucial when it comes to high-stakes decisions and troubleshooting. Since both provide information about predictors and…

Machine Learning · Computer Science 2024-04-26 Benjamin Leblanc , Pascal Germain

Rigorous Interpretation Is a Form of Evaluation

Current machine learning models are evaluated through behavioral snapshots, with benchmark accuracies, win rates and outcome-based metrics. Model explanations and evaluations, however, are fundamentally intertwined: understanding why a…

Computers and Society · Computer Science 2026-05-08 Isabelle Lee , Emmy Liu , Cathy Jiao , Brihi Joshi , Dani Yogatama , Fazl Barez , Michael Saxon

Interpretable Artificial Intelligence through the Lens of Feature Interaction

Interpretation of deep learning models is a very challenging problem because of their large number of parameters, complex connections between nodes, and unintelligible feature representations. Despite this, many view interpretability as a…

Machine Learning · Computer Science 2021-03-05 Michael Tsang , James Enouen , Yan Liu

Deep Interpretable Models of Theory of Mind

When developing AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable…

Machine Learning · Computer Science 2021-07-14 Ini Oguntola , Dana Hughes , Katia Sycara

Evaluating Interpretable Reinforcement Learning by Distilling Policies into Programs

There exist applications of reinforcement learning like medicine where policies need to be ''interpretable'' by humans. User studies have shown that some policy classes might be more interpretable than others. However, it is costly to…

Machine Learning · Computer Science 2025-03-12 Hector Kohler , Quentin Delfosse , Waris Radji , Riad Akrour , Philippe Preux

Interpretability Needs a New Paradigm

Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be…

Machine Learning · Computer Science 2024-11-14 Andreas Madsen , Himabindu Lakkaraju , Siva Reddy , Sarath Chandar

On Interpretability of Artificial Neural Networks: A Survey

Deep learning as represented by the artificial deep neural networks (DNNs) has achieved great success in many important areas that deal with text, images, videos, graphs, and so on. However, the black-box nature of DNNs has become one of…

Machine Learning · Computer Science 2021-09-29 Fenglei Fan , Jinjun Xiong , Mengzhou Li , Ge Wang

Transparent AI: The Case for Interpretability and Explainability

As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI…

Machine Learning · Computer Science 2025-08-01 Dhanesh Ramachandram , Himanshu Joshi , Judy Zhu , Dhari Gandhi , Lucas Hartman , Ananya Raval