Related papers: Sample Attackability in Natural Language Adversari…

Identifying Adversarially Attackable and Robust Samples

Adversarial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. Despite extensive research on generating adversarial attacks and building defense…

Machine Learning · Computer Science 2023-06-27 Vyas Raina , Mark Gales

A Survey of Adversarial Defences and Robustness in NLP

In the past few years, it has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data, leaving them vulnerable to attack. Various authors have proposed strong…

Computation and Language · Computer Science 2023-04-19 Shreya Goyal , Sumanth Doddapaneni , Mitesh M. Khapra , Balaraman Ravindran

Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks

Machine learning systems based on deep neural networks, being able to produce state-of-the-art results on various perception tasks, have gained mainstream adoption in many applications. However, they are shown to be vulnerable to…

Machine Learning · Computer Science 2018-01-16 Bo Luo , Yannan Liu , Lingxiao Wei , Qiang Xu

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial…

Computation and Language · Computer Science 2021-07-26 Ivan Fursov , Alexey Zaytsev , Pavel Burnyshev , Ekaterina Dmitrieva , Nikita Klyuchnikov , Andrey Kravchenko , Ekaterina Artemova , Evgeny Burnaev

Enhancing adversarial robustness in Natural Language Inference using explanations

The surge of state-of-the-art Transformer-based models has undoubtedly pushed the limits of NLP model performance, excelling in a variety of tasks. We cast the spotlight on the underexplored task of Natural Language Inference (NLI), since…

Computation and Language · Computer Science 2025-08-04 Alexandros Koulakos , Maria Lymperaiou , Giorgos Filandrianos , Giorgos Stamou

Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey

With the development of high computational devices, deep neural networks (DNNs), in recent years, have gained significant popularity in many Artificial Intelligence (AI) applications. However, previous efforts have shown that DNNs were…

Computation and Language · Computer Science 2019-04-12 Wei Emma Zhang , Quan Z. Sheng , Ahoud Alhazmi , Chenliang Li

Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks

Adversarial examples are inevitable on the road of pervasive applications of deep neural networks (DNN). Imperceptible perturbations applied on natural samples can lead DNN-based classifiers to output wrong prediction with fair confidence…

Machine Learning · Computer Science 2020-11-04 Tao Bai , Jinqi Luo , Jun Zhao

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models: minor changes of the input can force a model failure. Most of the state of the art frameworks focus on adversarial attacks for images…

Machine Learning · Computer Science 2020-06-22 I. Fursov , A. Zaytsev , N. Kluchnikov , A. Kravchenko , E. Burnaev

Residue-Based Natural Language Adversarial Attack Detection

Deep learning based systems are susceptible to adversarial attacks, where a small, imperceptible change at the input alters the model prediction. However, to date the majority of the approaches to detect these attacks have been designed for…

Computation and Language · Computer Science 2022-09-27 Vyas Raina , Mark Gales

A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement

Recent years have seen the wide application of NLP models in crucial areas such as finance, medical treatment, and news media, raising concerns of the model robustness and vulnerabilities. In this paper, we propose a novel prompt-based…

Computation and Language · Computer Science 2022-03-22 Yuting Yang , Pei Huang , Juan Cao , Jintao Li , Yun Lin , Jin Song Dong , Feifei Ma , Jian Zhang

How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However,…

Computation and Language · Computer Science 2023-05-26 Salijona Dyrmishi , Salah Ghamizi , Maxime Cordy

Distinguishing Non-natural from Natural Adversarial Samples for More Robust Pre-trained Language Model

Recently, the problem of robustness of pre-trained language models (PrLMs) has received increasing research interest. Latest studies on adversarial attacks achieve high attack success rates against PrLMs, claiming that PrLMs are not robust.…

Machine Learning · Computer Science 2022-03-23 Jiayi Wang , Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning

Intentionally crafted adversarial samples have effectively exploited weaknesses in deep neural networks. A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample such…

Machine Learning · Computer Science 2022-11-07 Anaelia Ovalle , Evan Czyzycki , Cho-Jui Hsieh

Generating Valid and Natural Adversarial Examples with Large Language Models

Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream…

Computation and Language · Computer Science 2023-11-21 Zimu Wang , Wei Wang , Qi Chen , Qiufeng Wang , Anh Nguyen

Towards Deep Learning Models Resistant to Adversarial Attacks

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings…

Machine Learning · Statistics 2019-09-06 Aleksander Madry , Aleksandar Makelov , Ludwig Schmidt , Dimitris Tsipras , Adrian Vladu

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior…

Machine Learning · Computer Science 2018-10-30 Guanhong Tao , Shiqing Ma , Yingqi Liu , Xiangyu Zhang

Adversarial Examples: Attacks and Defenses for Deep Learning

With rapid progress and significant successes in a wide spectrum of applications, deep learning is being applied in many safety-critical environments. However, deep neural networks have been recently found vulnerable to well-designed input…

Machine Learning · Computer Science 2018-07-10 Xiaoyong Yuan , Pan He , Qile Zhu , Xiaolin Li

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Word-level adversarial attacks have shown success in NLP models, drastically decreasing the performance of transformer-based models in recent years. As a countermeasure, adversarial defense has been explored, but relatively few efforts have…

Computation and Language · Computer Science 2022-03-04 KiYoon Yoo , Jangho Kim , Jiho Jang , Nojun Kwak

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model.…

Computation and Language · Computer Science 2020-09-22 Yuan Zang , Bairu Hou , Fanchao Qi , Zhiyuan Liu , Xiaojun Meng , Maosong Sun

Adversarial Attack and Defense of Structured Prediction Models

Building an effective adversarial attacker and elaborating on countermeasures for adversarial attacks for natural language processing (NLP) have attracted a lot of research in recent years. However, most of the existing approaches focus on…

Computation and Language · Computer Science 2020-10-20 Wenjuan Han , Liwen Zhang , Yong Jiang , Kewei Tu