Related papers: Attributional Robustness Training using Input-Grad…
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their…
With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized…
An emerging problem in trustworthy machine learning is to train models that produce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our…
Attribution maps are popular tools for explaining neural networks predictions. By assigning an importance value to each input dimension that represents its impact towards the outcome, they give an intuitive explanation of the decision…
To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing…
Deep neural networks are the default choice of learning models for computer vision tasks. Extensive work has been carried out in recent years on explaining deep models for vision tasks such as classification. However, recent work has shown…
Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the…
With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no…
The adversarial robustness of attributions is a fundamental requirement for reliable explainability in deep learning, yet existing approaches typically rely on computationally expensive explicit regularization. In this work, we show that…
Current SOTA adversarially robust models are mostly based on adversarial training (AT) and differ only by some regularizers either at inner maximization or outer minimization steps. Being repetitive in nature during the inner maximization…
This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in the model's outputs and introduces a new way of…
Adversarial training has been actively studied in recent computer vision research to improve the robustness of models. However, due to the huge computational cost of generating adversarial samples, adversarial training methods are often…
While existing work in robust deep learning has focused on small pixel-level norm-based perturbations, this may not account for perturbations encountered in several real-world settings. In many such cases although test data might not be…
Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising…
Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to…
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's…
Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. There is an emerging literature on tackling this problem by…
Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can enhance robustness, it fails to address the…
Interpreting the decisions of complex computer vision models is crucial to establish trust and accountability, especially in safety-critical domains. An established approach to interpretability is generating visual attribution maps that…
Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical…