Related papers: Unlearning-based Neural Interpretations

Interpretation of Neural Networks is Susceptible to Universal Adversarial Perturbations

Interpreting neural network classifiers using gradient-based saliency maps has been extensively studied in the deep learning literature. While the existing algorithms manage to achieve satisfactory performance in application to standard…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Haniyeh Ehsani Oskouie , Farzan Farnia

Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning

Machine Unlearning has recently garnered significant attention, aiming to selectively remove knowledge associated with specific data while preserving the model's performance on the remaining data. A fundamental challenge in this process is…

Machine Learning · Computer Science 2025-07-29 Gaurav Patel , Qiang Qiu

Gradient-based Analysis of NLP Models is Manipulable

Gradient-based analysis methods, such as saliency map visualizations and adversarial input perturbations, have found widespread use in interpreting neural NLP models due to their simplicity, flexibility, and most importantly, their…

Computation and Language · Computer Science 2020-10-13 Junlin Wang , Jens Tuyls , Eric Wallace , Sameer Singh

The Neglected Baseline in Model Interpretation

We observe that existing model interpretation methods generally ignore the baseline, and such neglect often results in imprecise or even incorrect interpretation. In this paper, we reformulate the task of model interpretation and the…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Yongjin Cui , Xiaohui Fan

Targeted Unlearning Using Perturbed Sign Gradient Methods With Applications On Medical Images

Machine unlearning aims to remove the influence of specific training samples from a trained model without full retraining. While prior work has largely focused on privacy-motivated settings, we recast unlearning as a general-purpose tool…

Image and Video Processing · Electrical Eng. & Systems 2026-02-11 George R. Nahass , Zhu Wang , Homa Rashidisabet , Won Hwa Kim , Sasha Hubschman , Jeffrey C. Peterson , Chad A. Purnell , Pete Setabutr , Ann Q. Tran , Darvin Yi , Sathya N. Ravi

Dichotomy of Feature Learning and Unlearning: Fast-Slow Analysis on Neural Networks with Stochastic Gradient Descent

The dynamics of gradient-based training in neural networks often exhibit nontrivial structures; hence, understanding them remains a central challenge in theoretical machine learning. In particular, a concept of feature unlearning, in which…

Machine Learning · Computer Science 2026-02-10 Shota Imai , Sota Nishiyama , Masaaki Imaizumi

A Learning Paradigm for Interpretable Gradients

This paper studies interpretability of convolutional networks by means of saliency maps. Most approaches based on Class Activation Maps (CAM) combine information from fully connected layers and gradient through variants of backpropagation.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Felipe Torres Figueroa , Hanwei Zhang , Ronan Sicre , Yannis Avrithis , Stephane Ayache

Stable Forgetting: Bounded Parameter-Efficient Unlearning in Foundation Models

Machine unlearning in foundation models (e.g., language and vision transformers) is essential for privacy and safety; however, existing approaches are unstable and unreliable. A widely used strategy, the gradient difference method, applies…

Machine Learning · Computer Science 2026-03-19 Arpit Garg , Hemanth Saratchandran , Ravi Garg , Simon Lucey

Unsupervised Interpretable Basis Extraction for Concept-Based Visual Explanations

An important line of research attempts to explain CNN image classifier predictions and intermediate layer representations in terms of human-understandable concepts. Previous work supports that deep representations are linearly separable…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Alexandros Doumanoglou , Stylianos Asteriadis , Dimitrios Zarpalas

Gradients as Features for Deep Representation Learning

We address the challenging problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks. Specifically, we propose to explore gradient-based features. These features are gradients of the…

Machine Learning · Computer Science 2020-04-14 Fangzhou Mu , Yingyu Liang , Yin Li

Deep Explainable Learning with Graph Based Data Assessing and Rule Reasoning

Learning an explainable classifier often results in low accuracy model or ends up with a huge rule set, while learning a deep model is usually more capable of handling noisy data at scale, but with the cost of hard to explain the result and…

Artificial Intelligence · Computer Science 2022-11-11 Yuanlong Li , Gaopan Huang , Min Zhou , Chuan Fu , Honglin Qiao , Yan He

CDLNet: Robust and Interpretable Denoising Through Deep Convolutional Dictionary Learning

Deep learning based methods hold state-of-the-art results in image denoising, but remain difficult to interpret due to their construction from poorly understood building blocks such as batch-normalization, residual learning, and feature…

Image and Video Processing · Electrical Eng. & Systems 2021-03-09 Nikola Janjušević , Amirhossein Khalilian-Gourtani , Yao Wang

Remaining Useful Life Estimation Under Uncertainty with Causal GraphNets

In this work, a novel approach for the construction and training of time series models is presented that deals with the problem of learning on large time series with non-equispaced observations, which at the same time may possess features…

Machine Learning · Computer Science 2020-11-25 Charilaos Mylonas , Eleni Chatzi

Gradients as a Measure of Uncertainty in Neural Networks

Despite tremendous success of modern neural networks, they are known to be overconfident even when the model encounters inputs with unfamiliar conditions. Detecting such inputs is vital to preventing models from making naive predictions…

Computer Vision and Pattern Recognition · Computer Science 2020-09-07 Jinsol Lee , Ghassan AlRegib

Learning Interpretable Deep Disentangled Neural Networks for Hyperspectral Unmixing

Although considerable effort has been dedicated to improving the solution to the hyperspectral unmixing problem, non-idealities such as complex radiation scattering and endmember variability negatively impact the performance of most…

Image and Video Processing · Electrical Eng. & Systems 2023-10-05 Ricardo Augusto Borsoi , Deniz Erdoğmuş , Tales Imbiriba

Learning fixed points of recurrent neural networks by reparameterizing the network model

In computational neuroscience, fixed points of recurrent neural networks are commonly used to model neural responses to static or slowly changing stimuli. These applications raise the question of how to train the weights in a recurrent…

Neurons and Cognition · Quantitative Biology 2023-07-28 Vicky Zhu , Robert Rosenbaum

Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning

Machine unlearning has emerged as a critical capability for addressing privacy, safety, and regulatory concerns in large language models (LLMs). Existing methods operate at the sequence level, applying uniform updates across all tokens…

Computation and Language · Computer Science 2026-05-07 Jiawei Wu , Doudou Zhou

TAG: Task-based Accumulated Gradients for Lifelong learning

When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient…

Machine Learning · Computer Science 2022-08-31 Pranshu Malviya , Balaraman Ravindran , Sarath Chandar

SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation

Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of…

Computer Vision and Pattern Recognition · Computer Science 2026-04-29 Ali Karkehabadi , Jamshid Hassanpour , Houman Homayoun , Avesta Sasan

Gradient Estimation Using Stochastic Computation Graphs

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external…

Machine Learning · Computer Science 2016-01-06 John Schulman , Nicolas Heess , Theophane Weber , Pieter Abbeel