Related papers: Propositional Interpretability in Artificial Intel…

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Explainable AI is an emerging field providing solutions for acquiring insights into automated systems' rationale. It has been put on the AI map by suggesting ways to tackle key ethical and societal issues. Existing explanation techniques…

Machine Learning · Computer Science 2022-05-02 Ioannis Mollas , Nick Bassiliades , Grigorios Tsoumakas

Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question…

Artificial Intelligence · Computer Science 2024-08-01 Martina G. Vilas , Federico Adolfi , David Poeppel , Gemma Roig

On the Semantic Interpretability of Artificial Intelligence Models

Artificial Intelligence models are becoming increasingly more powerful and accurate, supporting or even replacing humans' decision making. But with increased power and accuracy also comes higher complexity, making it hard for users to…

Artificial Intelligence · Computer Science 2019-07-10 Vivian S. Silva , André Freitas , Siegfried Handschuh

Mechanistic Interpretability for AI Safety -- A Review

Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse engineering the computational mechanisms and representations learned by neural networks…

Artificial Intelligence · Computer Science 2024-08-27 Leonard Bereska , Efstratios Gavves

Mechanistic Interpretability Needs Philosophy

Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions,…

Computation and Language · Computer Science 2026-05-20 Iwan Williams , Ninell Oldenburg , Ruchira Dhar , Joshua Hatherley , Constanza Fierro , Nina Rajcic , Sandrine R. Schiller , Filippos Stamatiou , Anders Søgaard

Open Problems in Mechanistic Interpretability

Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater…

Machine Learning · Computer Science 2025-01-29 Lee Sharkey , Bilal Chughtai , Joshua Batson , Jack Lindsey , Jeff Wu , Lucius Bushnaq , Nicholas Goldowsky-Dill , Stefan Heimersheim , Alejandro Ortega , Joseph Bloom , Stella Biderman , Adria Garriga-Alonso , Arthur Conmy , Neel Nanda , Jessica Rumbelow , Martin Wattenberg , Nandi Schoots , Joseph Miller , Eric J. Michaud , Stephen Casper , Max Tegmark , William Saunders , David Bau , Eric Todd , Atticus Geiger , Mor Geva , Jesse Hoogland , Daniel Murfet , Tom McGrath

Transparent AI: The Case for Interpretability and Explainability

As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI…

Machine Learning · Computer Science 2025-08-01 Dhanesh Ramachandram , Himanshu Joshi , Judy Zhu , Dhari Gandhi , Lucas Hartman , Ananya Raval

The Pragmatic Turn in Explainable Artificial Intelligence (XAI)

In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the…

Artificial Intelligence · Computer Science 2020-06-23 Andrés Páez

The Quest for Interpretable and Responsible Artificial Intelligence

Artificial Intelligence (AI) provides many opportunities to improve private and public life. Discovering patterns and structures in large troves of data in an automated manner is a core component of data science, and currently drives…

Artificial Intelligence · Computer Science 2019-10-11 Vaishak Belle

What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play

Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing…

Artificial Intelligence · Computer Science 2019-06-11 Shi Feng , Jordan Boyd-Graber

Cybertrust: From Explainable to Actionable and Interpretable AI (AI2)

To benefit from AI advances, users and operators of AI systems must have reason to trust it. Trust arises from multiple interactions, where predictable and desirable behavior is reinforced over time. Providing the system's users with some…

Artificial Intelligence · Computer Science 2022-01-27 Stephanie Galaitsi , Benjamin D. Trump , Jeffrey M. Keisler , Igor Linkov , Alexander Kott

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this…

Artificial Intelligence · Computer Science 2017-08-29 Wojciech Samek , Thomas Wiegand , Klaus-Robert Müller

The Challenge of Crafting Intelligible Intelligence

Since Artificial Intelligence (AI) software uses techniques like deep lookahead search and stochastic optimization of huge neural networks to fit mammoth datasets, it often results in complex behavior that is difficult for people to…

Artificial Intelligence · Computer Science 2018-10-16 Daniel S. Weld , Gagan Bansal

Moral Dilemmas for Artificial Intelligence: a position paper on an application of Compositional Quantum Cognition

Traditionally, the way one evaluates the performance of an Artificial Intelligence (AI) system is via a comparison to human performance in specific tasks, treating humans as a reference for high-level cognition. However, these comparisons…

Artificial Intelligence · Computer Science 2019-11-25 Camilo M. Signorelli , Xerxes D. Arsiwalla

Some Critical and Ethical Perspectives on the Empirical Turn of AI Interpretability

We consider two fundamental and related issues currently faced by Artificial Intelligence (AI) development: the lack of ethics and interpretability of AI decisions. Can interpretable AI decisions help to address ethics in AI? Using a…

Artificial Intelligence · Computer Science 2021-09-21 Jean-Marie John-Mathews

A Unifying Framework for Learning Argumentation Semantics

Argumentation is a very active research field of Artificial Intelligence concerned with the representation and evaluation of arguments used in dialogues between humans and/or artificial agents. Acceptability semantics of formal…

Artificial Intelligence · Computer Science 2025-03-05 Zlatina Mileva , Antonis Bikakis , Fabio Aurelio D'Asaro , Mark Law , Alessandra Russo

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…

Machine Learning · Computer Science 2019-05-21 Mengnan Du , Ninghao Liu , Xia Hu

Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence

This paper argues that explainability is only one facet of a broader ideal that shapes our expectations towards artificial intelligence (AI). Fundamentally, the issue is to what extent AI exhibits systematicity--not merely in being…

Artificial Intelligence · Computer Science 2025-07-31 Matthieu Queloz

Challenging common interpretability assumptions in feature attribution explanations

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to…

Machine Learning · Computer Science 2020-12-07 Jonathan Dinu , Jeffrey Bigham , J. Zico Kolter

Explainability Case Studies

Explainability is one of the key ethical concepts in the design of AI systems. However, attempts to operationalize this concept thus far have tended to focus on approaches such as new software for model interpretability or guidelines with…

Computers and Society · Computer Science 2020-10-06 Ben Zevenbergen , Allison Woodruff , Patrick Gage Kelley