Related papers: A Differentiable Language Model Adversarial Attack…

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models: minor changes of the input can force a model failure. Most of the state of the art frameworks focus on adversarial attacks for images…

Machine Learning · Computer Science 2020-06-22 I. Fursov , A. Zaytsev , N. Kluchnikov , A. Kravchenko , E. Burnaev

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness…

Machine Learning · Computer Science 2024-05-08 Korn Sooksatra , Bikram Khanal , Pablo Rivas

TextDecepter: Hard Label Black Box Attack on Text Classifiers

Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarial examples. The generation of these adversarial examples helps to make the models more robust and gives us an insight into the underlying…

Computation and Language · Computer Science 2020-12-29 Sachin Saxena

Fooling the Textual Fooler via Randomizing Latent Representations

Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial…

Computation and Language · Computer Science 2024-06-11 Duy C. Hoang , Quang H. Nguyen , Saurav Manchanda , MinLong Peng , Kok-Seng Wong , Khoa D. Doan

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model.…

Computation and Language · Computer Science 2020-09-22 Yuan Zang , Bairu Hou , Fanchao Qi , Zhiyuan Liu , Xiaojun Meng , Maosong Sun

Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks

Deep neural networks are vulnerable to adversarial examples, which can mislead classifiers by adding imperceptible perturbations. An intriguing property of adversarial examples is their good transferability, making black-box attacks…

Computer Vision and Pattern Recognition · Computer Science 2019-04-08 Yinpeng Dong , Tianyu Pang , Hang Su , Jun Zhu

Model Robustness with Text Classification: Semantic-preserving adversarial attacks

We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of…

Computation and Language · Computer Science 2020-08-17 Rahul Singh , Tarun Joshi , Vijayan N. Nair , Agus Sudjianto

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we…

Machine Learning · Computer Science 2021-09-23 Liping Yuan , Xiaoqing Zheng , Yi Zhou , Cho-Jui Hsieh , Kai-wei Chang

Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However,…

Computation and Language · Computer Science 2021-10-07 Zongyi Li , Jianhan Xu , Jiehang Zeng , Linyang Li , Xiaoqing Zheng , Qi Zhang , Kai-Wei Chang , Cho-Jui Hsieh

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability

Training robust deep learning models for down-stream tasks is a critical challenge. Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way…

Machine Learning · Computer Science 2021-01-19 Mahmoud Hossam , Trung Le , He Zhao , Dinh Phung

Adversarial Example Detection by Classification for Deep Speech Recognition

Machine Learning systems are vulnerable to adversarial attacks and will highly likely produce incorrect outputs under these attacks. There are white-box and black-box attacks regarding to adversary's access level to the victim learning…

Machine Learning · Computer Science 2019-10-23 Saeid Samizade , Zheng-Hua Tan , Chao Shen , Xiaohong Guan

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for…

Computation and Language · Computer Science 2023-02-13 Piotr Gaiński , Klaudia Bałazy

On Adversarial Examples for Character-Level Neural Machine Translation

Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of…

Computation and Language · Computer Science 2018-06-26 Javid Ebrahimi , Daniel Lowd , Dejing Dou

Identifying Adversarial Attacks on Text Classifiers

The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack. In response, there is a growing body…

Computation and Language · Computer Science 2022-01-24 Zhouhang Xie , Jonathan Brophy , Adam Noack , Wencong You , Kalyani Asthana , Carter Perkins , Sabrina Reis , Sameer Singh , Daniel Lowd

Deep Text Classification Can be Fooled

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with…

Cryptography and Security · Computer Science 2019-01-08 Bin Liang , Hongcheng Li , Miaoqiang Su , Pan Bian , Xirong Li , Wenchang Shi

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in…

Machine Learning · Computer Science 2025-09-15 Prathyusha Devabhakthini , Sasmita Parida , Raj Mani Shukla , Suvendu Chandan Nayak , Tapadhir Das

A Generative Adversarial Attack for Multilingual Text Classifiers

Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is…

Computation and Language · Computer Science 2024-01-17 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

Rethinking Textual Adversarial Defense for Pre-trained Language Models

Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels…

Computation and Language · Computer Science 2022-08-23 Jiayi Wang , Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection

Textual adversarial attacks pose a serious security threat to Natural Language Processing (NLP) systems by introducing imperceptible perturbations that mislead deep learning models. While adversarial example detection offers a lightweight…

Computation and Language · Computer Science 2026-03-16 He Zhu , Yanshu Li , Wen Liu , Haitian Yang

Adv-OLM: Generating Textual Adversaries via OLM

Deep learning models are susceptible to adversarial examples that have imperceptible perturbations in the original input, resulting in adversarial attacks against these models. Analysis of these attacks on the state of the art transformers…

Computation and Language · Computer Science 2021-01-22 Vijit Malik , Ashwani Bhat , Ashutosh Modi