Related papers: Learning Interpretable Features via Adversarially …

Proper Network Interpretability Helps Adversarial Robustness in Classification

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to…

Machine Learning · Computer Science 2020-10-23 Akhilan Boopathy , Sijia Liu , Gaoyuan Zhang , Cynthia Liu , Pin-Yu Chen , Shiyu Chang , Luca Daniel

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Delyan Boychev

Robust Feature-Level Adversaries are Interpretability Tools

The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create…

Machine Learning · Computer Science 2023-09-12 Stephen Casper , Max Nadeau , Dylan Hadfield-Menell , Gabriel Kreiman

Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach

Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Chunheng Zhao , Pierluigi Pisu , Gurcan Comert , Negash Begashaw , Varghese Vaidyan , Nina Christine Hubig

Towards Robust Dataset Learning

Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Yihan Wu , Xinda Li , Florian Kerschbaum , Heng Huang , Hongyang Zhang

On the Benefits of Models with Perceptually-Aligned Gradients

Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the…

Computer Vision and Pattern Recognition · Computer Science 2020-05-05 Gunjan Aggarwal , Abhishek Sinha , Nupur Kumari , Mayank Singh

On the Lack of Robust Interpretability of Neural Text Classifiers

With the ever-increasing complexity of neural language models, practitioners have turned to methods for understanding the predictions of these models. One of the most well-adopted approaches for model interpretability is feature-based…

Computation and Language · Computer Science 2021-06-10 Muhammad Bilal Zafar , Michele Donini , Dylan Slack , Cédric Archambeau , Sanjiv Das , Krishnaram Kenthapadi

Learning Interpretable Microscopic Features of Tumor by Multi-task Adversarial CNNs To Improve Generalization

Adopting Convolutional Neural Networks (CNNs) in the daily routine of primary diagnosis requires not only near-perfect precision, but also a sufficient degree of generalization to data acquisition shifts and transparency. Existing CNN…

Computer Vision and Pattern Recognition · Computer Science 2023-06-22 Mara Graziani , Sebastian Otalora , Stephane Marchand-Maillet , Henning Muller , Vincent Andrearczyk

Towards More Robust Interpretation via Local Gradient Alignment

Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient…

Computer Vision and Pattern Recognition · Computer Science 2022-12-08 Sunghwan Joo , Seokhyeon Jeong , Juyeon Heo , Adrian Weller , Taesup Moon

Secure Diagnostics: Adversarial Robustness Meets Clinical Interpretability

Deep neural networks for medical image classification often fail to generalize consistently in clinical practice due to violations of the i.i.d. assumption and opaque decision-making. This paper examines interpretability in deep neural…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 Mohammad Hossein Najafi , Mohammad Morsali , Mohammadreza Pashanejad , Saman Soleimani Roudi , Mohammad Norouzi , Saeed Bagheri Shouraki

Interpretability-Guided Test-Time Adversarial Defense

We propose a novel and low-cost test-time adversarial defense by devising interpretability-guided neuron importance ranking methods to identify neurons important to the output classes. Our method is a training-free approach that can…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Akshay Kulkarni , Tsui-Wei Weng

Robust Classification using Robust Feature Augmentation

Existing deep neural networks, say for image classification, have been shown to be vulnerable to adversarial images that can cause a DNN misclassification, without any perceptible change to an image. In this work, we propose shock absorbing…

Machine Learning · Computer Science 2019-09-19 Kevin Eykholt , Swati Gupta , Atul Prakash , Amir Rahmati , Pratik Vaishnavi , Haizhong Zheng

Improving Interpretability via Regularization of Neural Activation Sensitivity

State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their wide adoption in mission-critical contexts is hampered by two major weaknesses - their susceptibility to adversarial attacks…

Machine Learning · Computer Science 2022-11-17 Ofir Moshe , Gil Fidel , Ron Bitton , Asaf Shabtai

Towards Robust Deep Neural Networks

We investigate the topics of sensitivity and robustness in feedforward and convolutional neural networks. Combining energy landscape techniques developed in computational chemistry with tools drawn from formal methods, we produce empirical…

Machine Learning · Statistics 2018-12-06 Timothy E. Wang , Yiming Gu , Dhagash Mehta , Xiaojun Zhao , Edgar A. Bernal

ANCHOR: Integrating Adversarial Training with Hard-mined Supervised Contrastive Learning for Robust Representation Learning

Neural networks have changed the way machines interpret the world. At their core, they learn by following gradients, adjusting their parameters step by step until they identify the most discriminant patterns in the data. This process gives…

Computer Vision and Pattern Recognition · Computer Science 2025-11-03 Samarup Bhattacharya , Anubhab Bhattacharya , Abir Chakraborty

Interpreting Adversarial Examples with Attributes

Deep computer vision systems being vulnerable to imperceptible and carefully crafted noise have raised questions regarding the robustness of their decisions. We take a step back and approach this problem from an orthogonal direction. We…

Computer Vision and Pattern Recognition · Computer Science 2019-04-18 Sadaf Gulshad , Jan Hendrik Metzen , Arnold Smeulders , Zeynep Akata

Learning More Robust Features with Adversarial Training

In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural…

Machine Learning · Computer Science 2018-04-24 Shuangtao Li , Yuanke Chen , Yanlin Peng , Lin Bai

Robust Sensible Adversarial Learning of Deep Neural Networks for Image Classification

The idea of robustness is central and critical to modern statistical analysis. However, despite the recent advances of deep neural networks (DNNs), many studies have shown that DNNs are vulnerable to adversarial attacks. Making…

Cryptography and Security · Computer Science 2023-06-02 Jungeum Kim , Xiao Wang

Interpreting and Improving Adversarial Robustness of Deep Neural Networks with Neuron Sensitivity

Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Despite the potential risk they bring, adversarial examples are also valuable for providing…

Computer Vision and Pattern Recognition · Computer Science 2020-12-15 Chongzhi Zhang , Aishan Liu , Xianglong Liu , Yitao Xu , Hang Yu , Yuqing Ma , Tianlin Li

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical…

Machine Learning · Computer Science 2020-12-07 Adam Noack , Isaac Ahern , Dejing Dou , Boyang Li