Related papers: Robust Attribution Regularization

Attributional Robustness Training using Input-Gradient Spatial Alignment

Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that the…

Computer Vision and Pattern Recognition · Computer Science 2020-07-21 Mayank Singh , Nupur Kumari , Puneet Mangla , Abhishek Sinha , Vineeth N Balasubramanian , Balaji Krishnamurthy

Axiomatic Attribution for Deep Networks

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms---Sensitivity and Implementation Invariance that attribution…

Machine Learning · Computer Science 2017-06-14 Mukund Sundararajan , Ankur Taly , Qiqi Yan

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical…

Machine Learning · Computer Science 2020-12-07 Adam Noack , Isaac Ahern , Dejing Dou , Boyang Li

Enhanced Regularizers for Attributional Robustness

Deep neural networks are the default choice of learning models for computer vision tasks. Extensive work has been carried out in recent years on explaining deep models for vision tasks such as classification. However, recent work has shown…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Anindya Sarkar , Anirban Sarkar , Vineeth N Balasubramanian

Improving Adversarial Robustness of Attribution via Implicit Regularization

The adversarial robustness of attributions is a fundamental requirement for reliable explainability in deep learning, yet existing approaches typically rely on computationally expensive explicit regularization. In this work, we show that…

Machine Learning · Computer Science 2026-05-29 Amir Mehrpanah , Matteo Gamba , Hossein Azizpour

Rethinking Robustness of Model Attributions

For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their…

Machine Learning · Computer Science 2023-12-19 Sandesh Kamath , Sankalp Mittal , Amit Deshpande , Vineeth N Balasubramanian

Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks

With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no…

Machine Learning · Computer Science 2022-07-13 Ian E. Nielsen , Dimah Dera , Ghulam Rasool , Nidhal Bouaynaya , Ravi P. Ramachandran

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method

Deep neural networks have produced significant progress among machine learning models in terms of accuracy and functionality, but their inner workings are still largely unknown. Attribution methods seek to shine a light on these "black box"…

Machine Learning · Computer Science 2023-06-27 Daniel Lundstrom , Meisam Razaviyayn

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

Deep neural networks have proven remarkably effective at solving many classification problems, but have been criticized recently for two major weaknesses: the reasons behind their predictions are uninterpretable, and the predictions…

Machine Learning · Computer Science 2017-11-28 Andrew Slavin Ross , Finale Doshi-Velez

Learning Representations Robust to Group Shifts and Adversarial Examples

Despite the high performance achieved by deep neural networks on various tasks, extensive studies have demonstrated that small tweaks in the input could fail the model predictions. This issue of deep neural networks has led to a number of…

Machine Learning · Computer Science 2022-02-22 Ming-Chang Chiu , Xuezhe Ma

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

In recent years, neural networks have demonstrated outstanding effectiveness in a large amount of applications.However, recent works have shown that neural networks are susceptible to adversarial examples, indicating possible flaws…

Machine Learning · Computer Science 2018-06-08 Fuxun Yu , Zirui Xu , Yanzhi Wang , Chenchen Liu , Xiang Chen

Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of…

Machine Learning · Computer Science 2025-12-09 Panagiota Kiourti , Anu Singh , Preeti Duraipandian , Weichao Zhou , Wenchao Li

A Rigorous Study of Integrated Gradients Method and Extensions to Internal Neuron Attributions

As deep learning (DL) efficacy grows, concerns for poor model explainability grow also. Attribution methods address the issue of explainability by quantifying the importance of an input feature for a model prediction. Among various methods,…

Machine Learning · Computer Science 2022-07-01 Daniel Lundstrom , Tianjian Huang , Meisam Razaviyayn

Learning Robust Models Using The Principle of Independent Causal Mechanisms

Standard supervised learning breaks down under data distribution shift. However, the principle of independent causal mechanisms (ICM, Peters et al. (2017)) can turn this weakness into an opportunity: one can take advantage of distribution…

Machine Learning · Computer Science 2021-02-09 Jens Müller , Robert Schmier , Lynton Ardizzone , Carsten Rother , Ullrich Köthe

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

While deep learning has resulted in major breakthroughs in many application domains, the frameworks commonly used in deep learning remain fragile to artificially-crafted and imperceptible changes in the data. In response to this fragility,…

Machine Learning · Computer Science 2020-11-03 Alexander Robey , Hamed Hassani , George J. Pappas

Improving performance of deep learning models with axiomatic attribution priors and expected gradients

Recent research has demonstrated that feature attribution methods for deep networks can themselves be incorporated into training; these attribution priors optimize for a model whose attributions have certain desirable properties -- most…

Machine Learning · Computer Science 2020-11-12 Gabriel Erion , Joseph D. Janizek , Pascal Sturmfels , Scott Lundberg , Su-In Lee

On the Robustness of Removal-Based Feature Attributions

To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing…

Machine Learning · Computer Science 2023-11-01 Chris Lin , Ian Covert , Su-In Lee

Towards Robust Dataset Learning

Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Yihan Wu , Xinda Li , Florian Kerschbaum , Heng Huang , Hongyang Zhang

Greedy PIG: Adaptive Integrated Gradients

Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model…

Machine Learning · Computer Science 2023-11-13 Kyriakos Axiotis , Sami Abu-al-haija , Lin Chen , Matthew Fahrbach , Gang Fu

Robust Implicit Backpropagation

Arguably the biggest challenge in applying neural networks is tuning the hyperparameters, in particular the learning rate. The sensitivity to the learning rate is due to the reliance on backpropagation to train the network. In this paper we…

Machine Learning · Statistics 2018-08-08 Francois Fagan , Garud Iyengar