Related papers: Unifying Attribution-Based Explanations Using Func…

Explaining by Removing: A Unified Framework for Model Explanation

Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based…

Machine Learning · Computer Science 2022-05-16 Ian Covert , Scott Lundberg , Su-In Lee

Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability

The increasing complexity of AI systems has made understanding their behavior critical. Numerous interpretability methods have been developed to attribute model behavior to three key aspects: input features, training data, and internal…

Machine Learning · Computer Science 2025-05-30 Shichang Zhang , Tessa Han , Usha Bhalla , Himabindu Lakkaraju

Unifying Feature-Based Explanations with Functional ANOVA and Cooperative Game Theory

Feature-based explanations, using perturbations or gradients, are a prevalent tool to understand decisions of black box machine learning models. Yet, differences between these methods still remain mostly unknown, which limits their…

Machine Learning · Computer Science 2025-04-18 Fabian Fumagalli , Maximilian Muschalik , Eyke Hüllermeier , Barbara Hammer , Julia Herbinger

Distributing Synergy Functions: Unifying Game-Theoretic Interaction Methods for Machine-Learning Explainability

Deep learning has revolutionized many areas of machine learning, from computer vision to natural language processing, but these high-performance models are generally "black box." Explaining such models would improve transparency and trust…

Machine Learning · Computer Science 2023-05-18 Daniel Lundstrom , Meisam Razaviyayn

Benchmarking Attribution Methods with Relative Feature Importance

Interpretability is an important area of research for safe deployment of machine learning systems. One particular type of interpretability method attributes model decisions to input features. Despite active development, quantitative…

Machine Learning · Computer Science 2019-11-06 Mengjiao Yang , Been Kim

Regionally Additive Models: Explainable-by-design models minimizing feature interactions

Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption…

Machine Learning · Computer Science 2023-09-22 Vasilis Gkolemis , Anargiros Tzerefos , Theodore Dalamagas , Eirini Ntoutsi , Christos Diou

Unveiling Concept Attribution in Diffusion Models

Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Quang H. Nguyen , Hoang Phan , Khoa D. Doan

Distribution-Based Feature Attribution for Explaining the Predictions of Any Classifier

The proliferation of complex, black-box AI models has intensified the need for techniques that can explain their decisions. Feature attribution methods have become a popular solution for providing post-hoc explanations, yet the field has…

Machine Learning · Computer Science 2025-11-13 Xinpeng Li , Kai Ming Ting

Selective Explanations

Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been…

Computers and Society · Computer Science 2024-05-31 Lucas Monteiro Paes , Dennis Wei , Flavio P. Calmon

Context-aware feature attribution through argumentation

Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important…

Machine Learning · Computer Science 2023-10-26 Jinfeng Zhong , Elsa Negre

An Additive Instance-Wise Approach to Multi-class Model Interpretation

Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main…

Machine Learning · Computer Science 2023-06-02 Vy Vo , Van Nguyen , Trung Le , Quan Hung Tran , Gholamreza Haffari , Seyit Camtepe , Dinh Phung

The Weighted M\"obius Score: A Unified Framework for Feature Attribution

Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features.…

Machine Learning · Computer Science 2023-05-17 Yifan Jiang , Shane Steinert-Threlkeld

A Closer Look at Reward Decomposition for High-Level Robotic Explanations

Explaining the behaviour of intelligent agents learned by reinforcement learning (RL) to humans is challenging yet crucial due to their incomprehensible proprioceptive states, variational intermediate goals, and resultant unpredictability.…

Machine Learning · Computer Science 2023-11-07 Wenhao Lu , Xufeng Zhao , Sven Magg , Martin Gromniak , Mengdi Li , Stefan Wermter

Dependency Decomposition and a Reject Option for Explainable Models

Deploying machine learning models in safety-related do-mains (e.g. autonomous driving, medical diagnosis) demands for approaches that are explainable, robust against adversarial attacks and aware of the model uncertainty. Recent deep…

Computer Vision and Pattern Recognition · Computer Science 2020-12-14 Jan Kronenberger , Anselm Haselhoff

Toward Understanding the Disagreement Problem in Neural Network Feature Attribution

In recent years, neural networks have demonstrated their remarkable ability to discern intricate patterns and relationships from raw data. However, understanding the inner workings of these black box models remains challenging, yet crucial…

Machine Learning · Statistics 2024-04-18 Niklas Koenen , Marvin N. Wright

Considerations When Learning Additive Explanations for Black-Box Models

Many methods to explain black-box models, whether local or global, are additive. In this paper, we study global additive explanations for non-additive models, focusing on four explanation methods: partial dependence, Shapley explanations…

Machine Learning · Statistics 2023-08-02 Sarah Tan , Giles Hooker , Paul Koch , Albert Gordo , Rich Caruana

Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the…

Machine Learning · Computer Science 2021-06-01 Ramaravind Kommiya Mothilal , Divyat Mahajan , Chenhao Tan , Amit Sharma

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based…

Computer Vision and Pattern Recognition · Computer Science 2023-06-16 Lukas Klein , João B. S. Carvalho , Mennatallah El-Assady , Paolo Penna , Joachim M. Buhmann , Paul F. Jaeger

Achieving Transparency in Distributed Machine Learning with Explainable Data Collaboration

Transparency of Machine Learning models used for decision support in various industries becomes essential for ensuring their ethical use. To that end, feature attribution methods such as SHAP (SHapley Additive exPlanations) are widely used…

Machine Learning · Computer Science 2022-12-08 Anna Bogdanova , Akira Imakura , Tetsuya Sakurai , Tomoya Fujii , Teppei Sakamoto , Hiroyuki Abe

Benchmarking the Attribution Quality of Vision Models

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Robin Hesse , Simone Schaub-Meyer , Stefan Roth