Related papers: Learning Explanations from Language Data

Training Feature Attribution for Vision Models

Deep neural networks are often considered opaque systems, prompting the need for explainability methods to improve trust and accountability. Existing approaches typically attribute test-time predictions either to input features (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-13 Aziz Bacha , Thomas George

Discriminative Attribution from Counterfactuals

We present a method for neural network interpretability by combining feature attribution with counterfactual explanations to generate attribution maps that highlight the most discriminative features between pairs of classes. We show that…

Machine Learning · Computer Science 2021-09-29 Nils Eckstein , Alexander S. Bates , Gregory S. X. E. Jefferis , Jan Funke

Learning how to explain neural networks: PatternNet and PatternAttribution

DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with…

Machine Learning · Statistics 2017-10-26 Pieter-Jan Kindermans , Kristof T. Schütt , Maximilian Alber , Klaus-Robert Müller , Dumitru Erhan , Been Kim , Sven Dähne

Interpreting Interpretations: Organizing Attribution Methods by Criteria

Motivated by distinct, though related, criteria, a growing number of attribution methods have been developed tointerprete deep learning. While each relies on the interpretability of the concept of "importance" and our ability to visualize…

Artificial Intelligence · Computer Science 2020-04-07 Zifan Wang , Piotr Mardziel , Anupam Datta , Matt Fredrikson

Learning Deep Attribution Priors Based On Prior Knowledge

Feature attribution methods, which explain an individual prediction made by a model as a sum of attributions for each input feature, are an essential tool for understanding the behavior of complex deep learning models. However, ensuring…

Machine Learning · Computer Science 2020-10-28 Ethan Weinberger , Joseph Janizek , Su-In Lee

Obtaining Example-Based Explanations from Deep Neural Networks

Most techniques for explainable machine learning focus on feature attribution, i.e., values are assigned to the features such that their sum equals the prediction. Example attribution is another form of explanation that assigns weights to…

Machine Learning · Computer Science 2025-02-28 Genghua Dong , Henrik Boström , Michalis Vazirgiannis , Roman Bresson

Visual Reasoning of Feature Attribution with Deep Recurrent Neural Networks

Deep Recurrent Neural Network (RNN) has gained popularity in many sequence classification tasks. Beyond predicting a correct class for each data instance, data scientists also want to understand what differentiating factors in the data have…

Machine Learning · Computer Science 2019-01-18 Chuan Wang , Takeshi Onishi , Keiichi Nemoto , Kwan-Liu Ma

Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals

Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Susu Sun , Stefano Woerner , Andreas Maier , Lisa M. Koch , Christian F. Baumgartner

Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key…

Machine Learning · Computer Science 2020-08-27 Darius Afchar , Romain Hennequin

Towards Unified Attribution in Explainable AI, Data-Centric AI, and Mechanistic Interpretability

The increasing complexity of AI systems has made understanding their behavior critical. Numerous interpretability methods have been developed to attribute model behavior to three key aspects: input features, training data, and internal…

Machine Learning · Computer Science 2025-05-30 Shichang Zhang , Tessa Han , Usha Bhalla , Himabindu Lakkaraju

Harmonizing Feature Attributions Across Deep Learning Architectures: Enhancing Interpretability and Consistency

Ensuring the trustworthiness and interpretability of machine learning models is critical to their deployment in real-world applications. Feature attribution methods have gained significant attention, which provide local explanations of…

Machine Learning · Computer Science 2023-09-20 Md Abdul Kadir , Gowtham Krishna Addluri , Daniel Sonntag

Sampling Matters in Explanations: Towards Trustworthy Attribution Analysis Building Block in Visual Models through Maximizing Explanation Certainty

Image attribution analysis seeks to highlight the feature representations learned by visual models such that the highlighted feature maps can reflect the pixel-wise importance of inputs. Gradient integration is a building block in the…

Computer Vision and Pattern Recognition · Computer Science 2025-06-26 Róisín Luo , James McDermott , Colm O'Riordan

Attribution Explanations for Deep Neural Networks: A Theoretical Perspective

Attribution explanation is a typical approach for explaining deep neural networks (DNNs), inferring an importance or contribution score for each input variable to the final output. In recent years, numerous attribution methods have been…

Machine Learning · Computer Science 2025-08-12 Huiqi Deng , Hongbin Pei , Quanshi Zhang , Mengnan Du

Explainable Deep Classification Models for Domain Generalization

Conventionally, AI models are thought to trade off explainability for lower accuracy. We develop a training strategy that not only leads to a more explainable AI system for object classification, but as a consequence, suffers no perceptible…

Computer Vision and Pattern Recognition · Computer Science 2020-03-17 Andrea Zunino , Sarah Adel Bargal , Riccardo Volpi , Mehrnoosh Sameki , Jianming Zhang , Stan Sclaroff , Vittorio Murino , Kate Saenko

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend…

Machine Learning · Computer Science 2024-02-15 Yang Zhang , Yawei Li , Hannah Brown , Mina Rezaei , Bernd Bischl , Philip Torr , Ashkan Khakzar , Kenji Kawaguchi

Benchmarking the Attribution Quality of Vision Models

Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Robin Hesse , Simone Schaub-Meyer , Stefan Roth

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Explainable AI aims to render model behavior understandable by humans, which can be seen as an intermediate step in extracting causal relations from correlative patterns. Due to the high risk of possible fatal decisions in image-based…

Computer Vision and Pattern Recognition · Computer Science 2023-06-16 Lukas Klein , João B. S. Carvalho , Mennatallah El-Assady , Paolo Penna , Joachim M. Buhmann , Paul F. Jaeger

Attributing Learned Concepts in Neural Networks to Training Data

By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning…

Machine Learning · Computer Science 2023-12-29 Nicholas Konz , Charles Godfrey , Madelyn Shapiro , Jonathan Tu , Henry Kvinge , Davis Brown

Domain Adaptations for Computer Vision Applications

A basic assumption of statistical learning theory is that train and test data are drawn from the same underlying distribution. Unfortunately, this assumption doesn't hold in many applications. Instead, ample labeled data might exist in a…

Computer Vision and Pattern Recognition · Computer Science 2012-11-21 Oscar Beijbom

Seeing in Words: Learning to Classify through Language Bottlenecks

Neural networks for computer vision extract uninterpretable features despite achieving high accuracy on benchmarks. In contrast, humans can explain their predictions using succinct and intuitive descriptions. To incorporate explainability…

Computer Vision and Pattern Recognition · Computer Science 2023-07-04 Khalid Saifullah , Yuxin Wen , Jonas Geiping , Micah Goldblum , Tom Goldstein