Related papers: BITE: Textual Backdoor Attacks with Iterative Trig…

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

Backdoor attacks pose a new threat to NLP models. A standard strategy to construct poisoned data in backdoor attacks is to insert triggers (e.g., rare words) into selected sentences and alter the original label to a target label. This…

Computation and Language · Computer Science 2022-04-28 Leilei Gan , Jiwei Li , Tianwei Zhang , Xiaoya Li , Yuxian Meng , Fei Wu , Yi Yang , Shangwei Guo , Chun Fan

Natural Backdoor Attack on Text Data

Recently, advanced NLP models have seen a surge in the usage of various applications. This raises the security threats of the released models. In addition to the clean models' unintentional weaknesses, {\em i.e.,} adversarial attacks, the…

Computation and Language · Computer Science 2021-01-18 Lichao Sun

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned…

Computation and Language · Computer Science 2021-06-04 Fanchao Qi , Mukai Li , Yangyi Chen , Zhengyan Zhang , Zhiyuan Liu , Yasheng Wang , Maosong Sun

Injecting Bias into Text Classification Models using Backdoor Attacks

The rapid growth of natural language processing (NLP) and pre-trained language models have enabled accurate text classification in a variety of settings. However, text classification models are susceptible to backdoor attacks, where an…

Cryptography and Security · Computer Science 2024-12-30 A. Dilara Yavuz , M. Emre Gursoy

Hidden Trigger Backdoor Attacks

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific…

Computation and Language · Computer Science 2021-03-30 Wenkai Yang , Lei Li , Zhiyuan Zhang , Xuancheng Ren , Xu Sun , Bin He

SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model's behavior in ways engineered by the attacker. One such tactic…

Computation and Language · Computer Science 2024-05-21 Xuanli He , Qiongkai Xu , Jun Wang , Benjamin I. P. Rubinstein , Trevor Cohn

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Backdoor attacks are a kind of emergent security threat in deep learning. After being injected with a backdoor, a deep neural model will behave normally on standard inputs but give adversary-specified predictions once the input contains…

Cryptography and Security · Computer Science 2022-10-20 Yangyi Chen , Fanchao Qi , Hongcheng Gao , Zhiyuan Liu , Maosong Sun

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is…

Computation and Language · Computer Science 2021-06-14 Fanchao Qi , Yuan Yao , Sophia Xu , Zhiyuan Liu , Maosong Sun

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some…

Cryptography and Security · Computer Science 2021-03-16 Chuanshuai Chen , Jiazhu Dai

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data…

Computer Vision and Pattern Recognition · Computer Science 2024-05-10 Binxiao Huang , Jason Chun Lok , Chang Liu , Ngai Wong

Invisible Textual Backdoor Attacks based on Dual-Trigger

Backdoor attacks pose an important security threat to textual large language models. Exploring textual backdoor attacks not only helps reveal the potential security risks of models, but also promotes innovation and development of defense…

Cryptography and Security · Computer Science 2025-07-21 Yang Hou , Qiuling Yue , Lujia Chai , Guozhao Liao , Wenbao Han , Wei Ou

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks…

Machine Learning · Computer Science 2024-07-17 Quang H. Nguyen , Nguyen Ngoc-Hieu , The-Anh Ta , Thanh Nguyen-Tang , Kok-Seng Wong , Hoang Thanh-Tung , Khoa D. Doan

Detecting Backdoors in Deep Text Classifiers

Deep neural networks are vulnerable to adversarial attacks, such as backdoor attacks in which a malicious adversary compromises a model during training such that specific behaviour can be triggered at test time by attaching a specific word…

Cryptography and Security · Computer Science 2022-10-21 You Guo , Jun Wang , Trevor Cohn

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been…

Machine Learning · Computer Science 2022-11-02 Ganqu Cui , Lifan Yuan , Bingxiang He , Yangyi Chen , Zhiyuan Liu , Maosong Sun

Hidden Backdoors in Human-Centric Language Models

Natural language processing (NLP) systems have been proven to be vulnerable to backdoor attacks, whereby hidden features (backdoors) are trained into a language model and may only be activated by specific inputs (called triggers), to trick…

Computation and Language · Computer Science 2021-09-29 Shaofeng Li , Hui Liu , Tian Dong , Benjamin Zi Hao Zhao , Minhui Xue , Haojin Zhu , Jialiang Lu

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by…

Computation and Language · Computer Science 2023-05-30 Kai Mei , Zheng Li , Zhenting Wang , Yang Zhang , Shiqing Ma

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based…

Computation and Language · Computer Science 2024-02-05 Shuai Zhao , Jinming Wen , Luu Anh Tuan , Junbo Zhao , Jie Fu

Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning

Deep learning models have achieved high performance on many tasks, and thus have been applied to many security-critical scenarios. For example, deep learning-based face recognition systems have been used to authenticate users to access many…

Cryptography and Security · Computer Science 2017-12-18 Xinyun Chen , Chang Liu , Bo Li , Kimberly Lu , Dawn Song

Backdoor Attacks with Input-unique Triggers in NLP

Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged, which creates a considerable threat to current natural language processing (NLP) systems.…

Computation and Language · Computer Science 2023-03-28 Xukun Zhou , Jiwei Li , Tianwei Zhang , Lingjuan Lyu , Muqiao Yang , Jun He