Related papers: BFClass: A Backdoor-free Text Classification Frame…

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as backdoor attacks in which a malicious adversary compromises a model during training such that specific behaviour can be triggered at test time by attaching a specific word…

Cryptography and Security · Computer Science 2022-10-21 You Guo , Jun Wang , Trevor Cohn

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some…

Cryptography and Security · Computer Science 2021-03-16 Chuanshuai Chen , Jiazhu Dai

A backdoor attack against LSTM-based text classification systems

With the widespread use of deep learning system in many applications, the adversary has strong incentive to explore vulnerabilities of deep neural networks and manipulate them. Backdoor attacks against deep neural networks have been…

Cryptography and Security · Computer Science 2019-06-05 Jiazhu Dai , Chuanshuai Chen

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled.…

Machine Learning · Computer Science 2023-10-31 Wencong You , Zayd Hammoudeh , Daniel Lowd

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Backdoor attacks are a kind of emergent security threat in deep learning. After being injected with a backdoor, a deep neural model will behave normally on standard inputs but give adversary-specified predictions once the input contains…

Cryptography and Security · Computer Science 2022-10-20 Yangyi Chen , Fanchao Qi , Hongcheng Gao , Zhiyuan Liu , Maosong Sun

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been…

Machine Learning · Computer Science 2022-11-02 Ganqu Cui , Lifan Yuan , Bingxiang He , Yangyi Chen , Zhiyuan Liu , Maosong Sun

UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks

Backdoor attacks are emerging threats to deep neural networks, which typically embed malicious behaviors into a victim model by injecting poisoned samples. Adversaries can activate the injected backdoor during inference by presenting the…

Cryptography and Security · Computer Science 2025-12-05 Bingyin Zhao , Yingjie Lao

A Study of Backdoors in Instruction Fine-tuned Language Models

Backdoor data poisoning, inserted within instruction examples used to fine-tune a foundation Large Language Model (LLM) for downstream tasks (\textit{e.g.,} sentiment prediction), is a serious security concern due to the evasive nature of…

Cryptography and Security · Computer Science 2024-08-23 Jayaram Raghuram , George Kesidis , David J. Miller

Black-box Detection of Backdoor Attacks with Limited Information and Data

Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments. A malicious backdoor could be embedded in a model by poisoning the training dataset, whose intention is to make…

Cryptography and Security · Computer Science 2021-03-25 Yinpeng Dong , Xiao Yang , Zhijie Deng , Tianyu Pang , Zihao Xiao , Hang Su , Jun Zhu

Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models

Supervised fine-tuning has become the predominant method for adapting large pretrained models to downstream tasks. However, recent studies have revealed that these models are vulnerable to backdoor attacks, where even a small number of…

Cryptography and Security · Computer Science 2025-01-08 Peihai Jiang , Xixiang Lyu , Yige Li , Jing Ma

Backdoors in Neural Models of Source Code

Deep neural networks are vulnerable to a range of adversaries. A particularly pernicious class of vulnerabilities are backdoors, where model predictions diverge in the presence of subtle triggers in inputs. An attacker can implant a…

Machine Learning · Computer Science 2022-12-20 Goutham Ramakrishnan , Aws Albarghouthi

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks…

Machine Learning · Computer Science 2024-07-17 Quang H. Nguyen , Nguyen Ngoc-Hieu , The-Anh Ta , Thanh Nguyen-Tang , Kok-Seng Wong , Hoang Thanh-Tung , Khoa D. Doan

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This…

Computation and Language · Computer Science 2022-04-28 Leilei Gan , Jiwei Li , Tianwei Zhang , Xiaoya Li , Yuxian Meng , Fei Wu , Yi Yang , Shangwei Guo , Chun Fan

Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models

Pre-trained language models have achieved remarkable success across a wide range of natural language processing (NLP) tasks, particularly when fine-tuned on large, domain-relevant datasets. However, they remain vulnerable to backdoor…

Computation and Language · Computer Science 2026-02-02 Anindya Sundar Das , Kangjie Chen , Monowar Bhuyan

Variance-Based Defense Against Blended Backdoor Attacks

Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models, primarily due to their stealthy nature. The model behaves normally on clean data but exhibits malicious behavior only when the attacker embeds a…

Machine Learning · Computer Science 2025-09-29 Sujeevan Aseervatham , Achraf Kerzazi , Younès Bennani

Injecting Bias into Text Classification Models using Backdoor Attacks

The rapid growth of natural language processing (NLP) and pre-trained language models have enabled accurate text classification in a variety of settings. However, text classification models are susceptible to backdoor attacks, where an…

Cryptography and Security · Computer Science 2024-12-30 A. Dilara Yavuz , M. Emre Gursoy

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

Backdoor attacks insert malicious data into a training set so that, during inference time, it misclassifies inputs that have been patched with a backdoor trigger as the malware specified label. For backdoor attacks to bypass human…

Cryptography and Security · Computer Science 2022-04-18 Yi Zeng , Minzhou Pan , Hoang Anh Just , Lingjuan Lyu , Meikang Qiu , Ruoxi Jia

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain…

Machine Learning · Computer Science 2023-11-28 Hengzhi Pei , Jinyuan Jia , Wenbo Guo , Bo Li , Dawn Song

Hidden Trigger Backdoor Attacks

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

Poisoned classifiers are not only backdoored, they are fundamentally broken

Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data, such that the presence of this trigger at test time causes the classifier to always predict…

Machine Learning · Computer Science 2021-10-06 Mingjie Sun , Siddhant Agarwal , J. Zico Kolter