Related papers: Propositional Interpretability in Artificial Intel…
Explainable AI is an emerging field providing solutions for acquiring insights into automated systems' rationale. It has been put on the AI map by suggesting ways to tackle key ethical and societal issues. Existing explanation techniques…
Inner Interpretability is a promising emerging field tasked with uncovering the inner mechanisms of AI systems, though how to develop these mechanistic theories is still much debated. Moreover, recent critiques raise issues that question…
Artificial Intelligence models are becoming increasingly more powerful and accurate, supporting or even replacing humans' decision making. But with increased power and accuracy also comes higher complexity, making it hard for users to…
Understanding AI systems' inner workings is critical for ensuring value alignment and safety. This review explores mechanistic interpretability: reverse engineering the computational mechanisms and representations learned by neural networks…
Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions,…
Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater…
As artificial intelligence systems increasingly inform high-stakes decisions across sectors, transparency has become foundational to responsible and trustworthy AI implementation. Leveraging our role as a leading institute in advancing AI…
In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the…
Artificial Intelligence (AI) provides many opportunities to improve private and public life. Discovering patterns and structures in large troves of data in an automated manner is a core component of data science, and currently drives…
Machine learning is an important tool for decision making, but its ethical and responsible application requires rigorous vetting of its interpretability and utility: an understudied problem, particularly for natural language processing…
To benefit from AI advances, users and operators of AI systems must have reason to trust it. Trust arises from multiple interactions, where predictable and desirable behavior is reinforced over time. Providing the system's users with some…
With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this…
Since Artificial Intelligence (AI) software uses techniques like deep lookahead search and stochastic optimization of huge neural networks to fit mammoth datasets, it often results in complex behavior that is difficult for people to…
Traditionally, the way one evaluates the performance of an Artificial Intelligence (AI) system is via a comparison to human performance in specific tasks, treating humans as a reference for high-level cognition. However, these comparisons…
We consider two fundamental and related issues currently faced by Artificial Intelligence (AI) development: the lack of ethics and interpretability of AI decisions. Can interpretable AI decisions help to address ethics in AI? Using a…
Argumentation is a very active research field of Artificial Intelligence concerned with the representation and evaluation of arguments used in dialogues between humans and/or artificial agents. Acceptability semantics of formal…
Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a…
This paper argues that explainability is only one facet of a broader ideal that shapes our expectations towards artificial intelligence (AI). Fundamentally, the issue is to what extent AI exhibits systematicity--not merely in being…
As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to…
Explainability is one of the key ethical concepts in the design of AI systems. However, attempts to operationalize this concept thus far have tended to focus on approaches such as new software for model interpretability or guidelines with…