Related papers: Attributional Robustness Training using Input-Grad…

Rethinking Robustness of Model Attributions

For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their…

Machine Learning · Computer Science 2023-12-19 Sandesh Kamath , Sankalp Mittal , Amit Deshpande , Vineeth N Balasubramanian

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Delyan Boychev

Robust Attribution Regularization

An emerging problem in trustworthy machine learning is to train models that produce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our…

Machine Learning · Computer Science 2019-10-29 Jiefeng Chen , Xi Wu , Vaibhav Rastogi , Yingyu Liang , Somesh Jha

FAR: A General Framework for Attributional Robustness

Attribution maps are popular tools for explaining neural networks predictions. By assigning an importance value to each input dimension that represents its impact towards the outcome, they give an intuitive explanation of the decision…

Machine Learning · Computer Science 2022-03-09 Adam Ivankay , Ivan Girardi , Chiara Marchiori , Pascal Frossard

On the Robustness of Removal-Based Feature Attributions

To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing…

Machine Learning · Computer Science 2023-11-01 Chris Lin , Ian Covert , Su-In Lee

Enhanced Regularizers for Attributional Robustness

Deep neural networks are the default choice of learning models for computer vision tasks. Extensive work has been carried out in recent years on explaining deep models for vision tasks such as classification. However, recent work has shown…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Anindya Sarkar , Anirban Sarkar , Vineeth N Balasubramanian

On the Benefits of Models with Perceptually-Aligned Gradients

Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-05 Gunjan Aggarwal , Abhishek Sinha , Nupur Kumari , Mayank Singh

Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks

With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no…

Machine Learning · Computer Science 2022-07-13 Ian E. Nielsen , Dimah Dera , Ghulam Rasool , Nidhal Bouaynaya , Ravi P. Ramachandran

Improving Adversarial Robustness of Attribution via Implicit Regularization

The adversarial robustness of attributions is a fundamental requirement for reliable explainability in deep learning, yet existing approaches typically rely on computationally expensive explicit regularization. In this work, we show that…

Machine Learning · Computer Science 2026-05-29 Amir Mehrpanah , Matteo Gamba , Hossein Azizpour

Get Fooled for the Right Reason: Improving Adversarial Robustness through a Teacher-guided Curriculum Learning Approach

Current SOTA adversarially robust models are mostly based on adversarial training (AT) and differ only by some regularizers either at inner maximization or outer minimization steps. Being repetitive in nature during the inner maximization…

Machine Learning · Computer Science 2021-11-02 Anindya Sarkar , Anirban Sarkar , Sowrya Gali , Vineeth N Balasubramanian

Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods

This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of…

Machine Learning · Computer Science 2025-12-09 Panagiota Kiourti , Anu Singh , Preeti Duraipandian , Weichao Zhou , Wenchao Li

Towards Robust Dataset Learning

Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Yihan Wu , Xinda Li , Florian Kerschbaum , Heng Huang , Hongyang Zhang

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

While existing work in robust deep learning has focused on small pixel-level norm-based perturbations, this may not account for perturbations encountered in several real-world settings. In many such cases although test data might not be…

Computer Vision and Pattern Recognition · Computer Science 2021-04-09 Tejas Gokhale , Rushil Anirudh , Bhavya Kailkhura , Jayaraman J. Thiagarajan , Chitta Baral , Yezhou Yang

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising…

Computer Vision and Pattern Recognition · Computer Science 2026-04-30 Yanyun Wang , Qingqing Ye , Li Liu , Zi Liang , Haibo Hu

Proper Network Interpretability Helps Adversarial Robustness in Classification

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to…

Machine Learning · Computer Science 2020-10-23 Akhilan Boopathy , Sijia Liu , Gaoyuan Zhang , Cynthia Liu , Pin-Yu Chen , Shiyu Chang , Luca Daniel

Robust Models Are More Interpretable Because Attributions Look Normal

Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's…

Machine Learning · Computer Science 2021-10-07 Zifan Wang , Matt Fredrikson , Anupam Datta

Distributionally Robust Learning with Stable Adversarial Training

Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. There is an emerging literature on tackling this problem by…

Machine Learning · Computer Science 2022-11-22 Jiashuo Liu , Zheyan Shen , Peng Cui , Linjun Zhou , Kun Kuang , Bo Li

Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach

Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Chunheng Zhao , Pierluigi Pisu , Gurcan Comert , Negash Begashaw , Varghese Vaidyan , Nina Christine Hubig

Learn to Rank: Visual Attribution by Learning Importance Ranking

Interpreting the decisions of complex computer vision models is crucial to establish trust and accountability, especially in safety-critical domains. An established approach to interpretability is generating visual attribution maps that…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 David Schinagl , Christian Fruhwirth-Reisinger , Alexander Prutsch , Samuel Schulter , Horst Possegger

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical…

Machine Learning · Computer Science 2020-12-07 Adam Noack , Isaac Ahern , Dejing Dou , Boyang Li