Related papers: Adversarial Gain

Algebraic Adversarial Attacks on Integrated Gradients

Adversarial attacks on explainability models have drastic consequences when explanations are used to understand the reasoning of neural networks in safety critical systems. Path methods are one such class of attribution methods susceptible…

Machine Learning · Computer Science 2025-02-28 Lachlan Simpson , Federico Costanza , Kyle Millar , Adriel Cheng , Cheng-Chew Lim , Hong Gunn Chew

Elephant in the Room: An Evaluation Framework for Assessing Adversarial Examples in NLP

An adversarial example is an input transformed by small perturbations that machine learning models consistently misclassify. While there are a number of methods proposed to generate adversarial examples for text data, it is not trivial to…

Computation and Language · Computer Science 2020-06-02 Ying Xu , Xu Zhong , Antonio Jose Jimeno Yepes , Jey Han Lau

Adversarial machine learning for protecting against online manipulation

Adversarial examples are inputs to a machine learning system that result in an incorrect output from that system. Attacks launched through this type of input can cause severe consequences: for example, in the field of image recognition, a…

Machine Learning · Computer Science 2021-11-24 Stefano Cresci , Marinella Petrocchi , Angelo Spognardi , Stefano Tognazzi

Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In…

Machine Learning · Computer Science 2018-08-28 Pasquale Minervini , Sebastian Riedel

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models: minor changes of the input can force a model failure. Most of the state of the art frameworks focus on adversarial attacks for images…

Machine Learning · Computer Science 2020-06-22 I. Fursov , A. Zaytsev , N. Kluchnikov , A. Kravchenko , E. Burnaev

Adversarial Examples in Modern Machine Learning: A Review

Recent research has found that many families of machine learning models are vulnerable to adversarial examples: inputs that are specifically designed to cause the target model to produce erroneous outputs. In this survey, we focus on…

Machine Learning · Computer Science 2019-11-19 Rey Reza Wiyatno , Anqi Xu , Ousmane Dia , Archy de Berker

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial…

Machine Learning · Computer Science 2025-02-18 Jon Vadillo , Roberto Santana , Jose A. Lozano

Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating adversarial examples is the art of creating a noise that is added to an input signal of a classifying neural network, and thus changing the network's classification, while keeping the noise as tenuous as possible. While the…

Computer Vision and Pattern Recognition · Computer Science 2021-12-08 Roee Ben-Shlomo , Yevgeniy Men , Ido Imanuel

Beating Attackers At Their Own Games: Adversarial Example Detection Using Adversarial Gradient Directions

Adversarial examples are input examples that are specifically crafted to deceive machine learning classifiers. State-of-the-art adversarial example detection methods characterize an input example as adversarial either by quantifying the…

Computer Vision and Pattern Recognition · Computer Science 2021-01-01 Yuhang Wu , Sunpreet S. Arora , Yanhong Wu , Hao Yang

A New Kind of Adversarial Example

Almost all adversarial attacks are formulated to add an imperceptible perturbation to an image in order to fool a model. Here, we consider the opposite which is adversarial examples that can fool a human but not a model. A large enough and…

Computer Vision and Pattern Recognition · Computer Science 2022-08-26 Ali Borji

Adversarial Machine Learning at Scale

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial…

Computer Vision and Pattern Recognition · Computer Science 2017-02-14 Alexey Kurakin , Ian Goodfellow , Samy Bengio

Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks

In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our…

Machine Learning · Computer Science 2022-05-27 Ramin Barati , Reza Safabakhsh , Mohammad Rahmati

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in…

Machine Learning · Computer Science 2025-09-15 Prathyusha Devabhakthini , Sasmita Parida , Raj Mani Shukla , Suvendu Chandan Nayak , Tapadhir Das

Adversarial classification: An adversarial risk analysis approach

Classification problems in security settings are usually contemplated as confrontations in which one or more adversaries try to fool a classifier to obtain a benefit. Most approaches to such adversarial classification problems have focused…

Machine Learning · Statistics 2019-09-25 Roi Naveiro , Alberto Redondo , David Ríos Insua , Fabrizio Ruggeri

Adversarial Examples - A Complete Characterisation of the Phenomenon

We provide a complete characterisation of the phenomenon of adversarial examples - inputs intentionally crafted to fool machine learning models. We aim to cover all the important concerns in this field of study: (1) the conjectures on the…

Computer Vision and Pattern Recognition · Computer Science 2019-02-19 Alexandru Constantin Serban , Erik Poll , Joost Visser

Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness

Adversarial examples are malicious inputs crafted to cause a model to misclassify them. Their most common instantiation, "perturbation-based" adversarial examples introduce changes to the input that leave its true label unchanged, yet…

Machine Learning · Computer Science 2019-03-26 Jörn-Henrik Jacobsen , Jens Behrmannn , Nicholas Carlini , Florian Tramèr , Nicolas Papernot

Customizing an Adversarial Example Generator with Class-Conditional GANs

Adversarial examples are intentionally crafted data with the purpose of deceiving neural networks into misclassification. When we talk about strategies to create such examples, we usually refer to perturbation-based methods that fabricate…

Computer Vision and Pattern Recognition · Computer Science 2018-06-28 Shih-hong Tsai

Adversarial Examples for Good: Adversarial Examples Guided Imbalanced Learning

Adversarial examples are inputs for machine learning models that have been designed by attackers to cause the model to make mistakes. In this paper, we demonstrate that adversarial examples can also be utilized for good to improve the…

Machine Learning · Computer Science 2022-08-31 Jie Zhang , Lei Zhang , Gang Li , Chao Wu

On the (Statistical) Detection of Adversarial Examples

Machine Learning (ML) models are applied in a variety of tasks such as network intrusion detection or Malware classification. Yet, these models are vulnerable to a class of malicious inputs known as adversarial examples. These are slightly…

Cryptography and Security · Computer Science 2017-10-18 Kathrin Grosse , Praveen Manoharan , Nicolas Papernot , Michael Backes , Patrick McDaniel

Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur…

Computation and Language · Computer Science 2022-11-01 Hannah Chen , Yangfeng Ji , David Evans