Related papers: Do Feature Attribution Methods Correctly Attribute…

Towards Rigorous Interpretations: a Formalisation of Feature Attribution

Feature attribution is often loosely presented as the process of selecting a subset of relevant features as a rationale of a prediction. Task-dependent by nature, precise definitions of "relevance" encountered in the literature are however…

Machine Learning · Computer Science 2021-07-12 Darius Afchar , Romain Hennequin , Vincent Guigue

Evaluating Feature Attribution Methods in the Image Domain

Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, little attention is given to the objective…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Arne Gevaert , Axel-Jan Rousseau , Thijs Becker , Dirk Valkenborg , Tijl De Bie , Yvan Saeys

Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency

Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of…

Machine Learning · Computer Science 2023-09-20 Md Abdul Kadir , Gowtham Krishna Addluri , Daniel Sonntag

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend…

Machine Learning · Computer Science 2024-02-15 Yang Zhang , Yawei Li , Hannah Brown , Mina Rezaei , Bernd Bischl , Philip Torr , Ashkan Khakzar , Kenji Kawaguchi

Benchmarking Attribution Methods with Relative Feature Importance

Interpretability is an important area of research for safe deployment of machine learning systems. One particular type of interpretability method attributes model decisions to input features. Despite active development, quantitative…

Machine Learning · Computer Science 2019-11-06 Mengjiao Yang , Been Kim

Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability

The increasing complexity of AI systems has made understanding their behavior critical. Numerous interpretability methods have been developed to attribute model behavior to three key aspects: input features, training data, and internal…

Machine Learning · Computer Science 2025-05-30 Shichang Zhang , Tessa Han , Usha Bhalla , Himabindu Lakkaraju

A Unified Taylor Framework for Revisiting Attribution Methods

Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built…

Machine Learning · Statistics 2021-04-14 Huiqi Deng , Na Zou , Mengnan Du , Weifu Chen , Guocan Feng , Xia Hu

Benchmarking the Attribution Quality of Vision Models

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Robin Hesse , Simone Schaub-Meyer , Stefan Roth

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Attribution explanation is a typical approach for explaining deep neural networks (DNNs), inferring an importance or contribution score for each input variable to the final output. In recent years, numerous attribution methods have been…

Machine Learning · Computer Science 2025-08-12 Huiqi Deng , Hongbin Pei , Quanshi Zhang , Mengnan Du

Feature Attribution from First Principles

Feature attribution methods are a popular approach to explain the behavior of machine learning models. They assign importance scores to each input feature, quantifying their influence on the model's prediction. However, evaluating these…

Machine Learning · Computer Science 2025-06-02 Magamed Taimeskhanov , Damien Garreau

Interpreting Interpretations: Organizing Attribution Methods by Criteria

Motivated by distinct, though related, criteria, a growing number of attribution methods have been developed tointerprete deep learning. While each relies on the interpretability of the concept of "importance" and our ability to visualize…

Artificial Intelligence · Computer Science 2020-04-07 Zifan Wang , Piotr Mardziel , Anupam Datta , Matt Fredrikson

Show or Suppress? Managing Input Uncertainty in Machine Learning Model Explanations

Feature attribution is widely used in interpretable machine learning to explain how influential each measured input feature value is for an output inference. However, measurements can be uncertain, and it is unclear how the awareness of…

Machine Learning · Computer Science 2021-01-26 Danding Wang , Wencan Zhang , Brian Y. Lim

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through…

Machine Learning · Computer Science 2024-11-26 Yawei Li , Yang Zhang , Kenji Kawaguchi , Ashkan Khakzar , Bernd Bischl , Mina Rezaei

Hidden in Plain Sight -- Class Competition Focuses Attribution Maps

Attribution methods reveal which input features a neural network uses for a prediction, adding transparency to their decisions. A common problem is that these attributions seem unspecific, highlighting both important and irrelevant…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Nils Philipp Walter , Jilles Vreeken , Jonas Fischer

Towards Better Understanding Attribution Methods

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Sukrut Rao , Moritz Böhle , Bernt Schiele

Better Understanding Differences in Attribution Methods via Systematic Evaluations

Deep neural networks are very successful on many vision tasks, but hard to interpret due to their black box nature. To overcome this, various post-hoc attribution methods have been proposed to identify image regions most influential to the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Sukrut Rao , Moritz Böhle , Bernt Schiele

Discriminative Attribution from Counterfactuals

We present a method for neural network interpretability by combining feature attribution with counterfactual explanations to generate attribution maps that highlight the most discriminative features between pairs of classes. We show that…

Machine Learning · Computer Science 2021-09-29 Nils Eckstein , Alexander S. Bates , Gregory S. X. E. Jefferis , Jan Funke

Context-aware feature attribution through argumentation

Feature attribution is a fundamental task in both machine learning and data analysis, which involves determining the contribution of individual features or variables to a model's output. This process helps identify the most important…

Machine Learning · Computer Science 2023-10-26 Jinfeng Zhong , Elsa Negre

The effectiveness of feature attribution methods and its correlation with automatic evaluation scores

Explaining the decisions of an Artificial Intelligence (AI) model is increasingly critical in many real-world, high-stake applications. Hundreds of papers have either proposed new feature attribution methods, discussed or harnessed these…

Computer Vision and Pattern Recognition · Computer Science 2022-07-25 Giang Nguyen , Daeyoung Kim , Anh Nguyen

llmSHAP: A Principled Approach to LLM Explainability

Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value…

Artificial Intelligence · Computer Science 2025-11-04 Filip Naudot , Tobias Sundqvist , Timotheus Kampik