Related papers: Identifying Adversarial Attacks on Text Classifier…

TCAB: A Large-Scale Text Classification Attack Benchmark

We introduce the Text Classification Attack Benchmark (TCAB), a dataset for analyzing, understanding, detecting, and labeling adversarial attacks against text classifiers. TCAB includes 1.5 million attack instances, generated by twelve…

Machine Learning · Computer Science 2022-10-25 Kalyani Asthana , Zhouhang Xie , Wencong You , Adam Noack , Jonathan Brophy , Sameer Singh , Daniel Lowd

Adversarial Attacks and Dimensionality in Text Classifiers

Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases. They significantly undermine the ability of high-performance neural networks by forcing misclassifications.…

Machine Learning · Computer Science 2024-04-04 Nandish Chattopadhyay , Atreya Goswami , Anupam Chattopadhyay

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial…

Computation and Language · Computer Science 2021-07-26 Ivan Fursov , Alexey Zaytsev , Pavel Burnyshev , Ekaterina Dmitrieva , Nikita Klyuchnikov , Andrey Kravchenko , Ekaterina Artemova , Evgeny Burnaev

Universal Adversarial Attacks with Natural Triggers for Text Classification

Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequences of words added to text processed by classifiers. Despite being successful, the word sequences…

Computation and Language · Computer Science 2021-04-09 Liwei Song , Xinwei Yu , Hsuan-Tung Peng , Karthik Narasimhan

Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations

Adversarial attacking aims to fool deep neural networks with adversarial examples. In the field of natural language processing, various textual adversarial attack models have been proposed, varying in the accessibility to the victim model.…

Computation and Language · Computer Science 2020-09-22 Yuan Zang , Bairu Hou , Fanchao Qi , Zhiyuan Liu , Xiaojun Meng , Maosong Sun

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in…

Machine Learning · Computer Science 2025-09-15 Prathyusha Devabhakthini , Sasmita Parida , Raj Mani Shukla , Suvendu Chandan Nayak , Tapadhir Das

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in…

Artificial Intelligence · Computer Science 2023-06-30 Edoardo Mosca , Shreyash Agarwal , Javier Rando , Georg Groh

A Generative Adversarial Attack for Multilingual Text Classifiers

Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is…

Computation and Language · Computer Science 2024-01-17 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers

Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial…

Computation and Language · Computer Science 2024-05-21 Tom Roth , Inigo Jauregi Unanue , Alsharif Abuadbba , Massimo Piccardi

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

Existing textual adversarial attacks usually utilize the gradient or prediction confidence to generate adversarial examples, making it hard to be deployed in real-world applications. To this end, we consider a rarely investigated but more…

Computation and Language · Computer Science 2022-10-25 Zhen Yu , Xiaosen Wang , Wanxiang Che , Kun He

Differentiable Language Model Adversarial Attacks on Categorical Sequence Classifiers

An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models: minor changes of the input can force a model failure. Most of the state of the art frameworks focus on adversarial attacks for images…

Machine Learning · Computer Science 2020-06-22 I. Fursov , A. Zaytsev , N. Kluchnikov , A. Kravchenko , E. Burnaev

Identifying Adversarial Sentences by Analyzing Text Complexity

Attackers create adversarial text to deceive both human perception and the current AI systems to perform malicious purposes such as spam product reviews and fake political posts. We investigate the difference between the adversarial and the…

Computation and Language · Computer Science 2019-12-20 Hoang-Quoc Nguyen-Son , Tran Phuong Thao , Seira Hidano , Shinsaku Kiyomoto

On the Transferability of Adversarial Attacksagainst Neural Text Classifier

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we…

Machine Learning · Computer Science 2021-09-23 Liping Yuan , Xiaoqing Zheng , Yi Zhou , Cho-Jui Hsieh , Kai-wei Chang

Identification of Attack-Specific Signatures in Adversarial Examples

The adversarial attack literature contains a myriad of algorithms for crafting perturbations which yield pathological behavior in neural networks. In many cases, multiple algorithms target the same tasks and even enforce the same…

Machine Learning · Computer Science 2021-10-14 Hossein Souri , Pirazh Khorramshahi , Chun Pong Lau , Micah Goldblum , Rama Chellappa

Token-Modification Adversarial Attacks for Natural Language Processing: A Survey

Many adversarial attacks target natural language processing systems, most of which succeed through modifying the individual tokens of a document. Despite the apparent uniqueness of each of these attacks, fundamentally they are simply a…

Computation and Language · Computer Science 2024-01-09 Tom Roth , Yansong Gao , Alsharif Abuadbba , Surya Nepal , Wei Liu

Adversarial Attack Type I: Cheat Classifiers by Significant Changes

Despite the great success of deep neural networks, the adversarial attack can cheat some well-trained classifiers by small permutations. In this paper, we propose another type of adversarial attack that can cheat classifiers by significant…

Machine Learning · Computer Science 2019-07-23 Sanli Tang , Xiaolin Huang , Mingjian Chen , Chengjin Sun , Jie Yang

Automated Adversarial Discovery for Safety Classifiers

Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial…

Computation and Language · Computer Science 2024-06-26 Yash Kumar Lal , Preethi Lahoti , Aradhana Sinha , Yao Qin , Ananth Balashankar

TextDecepter: Hard Label Black Box Attack on Text Classifiers

Machine learning has been proven to be susceptible to carefully crafted samples, known as adversarial examples. The generation of these adversarial examples helps to make the models more robust and gives us an insight into the underlying…

Computation and Language · Computer Science 2020-12-29 Sachin Saxena

Detection of Word Adversarial Examples in Text Classification: Benchmark and Baseline via Robust Density Estimation

Word-level adversarial attacks have shown success in NLP models, drastically decreasing the performance of transformer-based models in recent years. As a countermeasure, adversarial defense has been explored, but relatively few efforts have…

Computation and Language · Computer Science 2022-03-04 KiYoon Yoo , Jangho Kim , Jiho Jang , Nojun Kwak

Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems

Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial…

Machine Learning · Computer Science 2020-06-16 Yuanjiang Cao , Xiaocong Chen , Lina Yao , Xianzhi Wang , Wei Emma Zhang