Related papers: Task-Agnostic Detector for Insertion-Based Backdoo…

BadPre: Task-agnostic Backdoor Attacks to Pre-trained NLP Foundation Models

Pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks. This significantly accelerates the development of language models. However, NLP models have been shown to be vulnerable to…

Computation and Language · Computer Science 2021-10-07 Kangjie Chen , Yuxian Meng , Xiaofei Sun , Shangwei Guo , Tianwei Zhang , Jiwei Li , Chun Fan

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in…

Artificial Intelligence · Computer Science 2023-06-30 Edoardo Mosca , Shreyash Agarwal , Javier Rando , Georg Groh

Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm

Parameter-efficient fine-tuning (PEFT) has become a key training strategy for large language models. However, its reliance on fewer trainable parameters poses security risks, such as task-agnostic backdoors. Despite their severe impact on a…

Computation and Language · Computer Science 2024-10-08 Jaehan Kim , Minkyoo Song , Seung Ho Na , Seungwon Shin

Attack and Defense of Deep Learning Models in the Field of Web Attack Detection

The challenge of WAD (web attack detection) is growing as hackers continuously refine their methods to evade traditional detection. Deep learning models excel in handling complex unknown attacks due to their strong generalization and…

Machine Learning · Computer Science 2024-06-19 Lijia Shi , Shihao Dong

TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Prompt-based learning has been widely applied in many low-resource NLP tasks such as few-shot scenarios. However, this paradigm has been shown to be vulnerable to backdoor attacks. Most of the existing attack methods focus on inserting…

Computation and Language · Computer Science 2023-11-30 Zihao Tan , Qingliang Chen , Yongjian Huang , Chen Liang

Defending against Insertion-based Textual Backdoor Attacks via Attribution

Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training. Defending against such backdoor attacks has become urgent and important. In this paper, we propose AttDef,…

Computation and Language · Computer Science 2023-08-08 Jiazhao Li , Zhuofeng Wu , Wei Ping , Chaowei Xiao , V. G. Vinod Vydiswaran

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic…

Cryptography and Security · Computer Science 2026-04-15 Haotian Jin , Yang Li , Haihui Fan , Lin Shen , Xiangfang Li , Bo Li

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is…

Computation and Language · Computer Science 2021-06-14 Fanchao Qi , Yuan Yao , Sophia Xu , Zhiyuan Liu , Maosong Sun

DeepTaskAPT: Insider APT detection using Task-tree based Deep Learning

APT, known as Advanced Persistent Threat, is a difficult challenge for cyber defence. These threats make many traditional defences ineffective as the vulnerabilities exploited by these threats are insiders who have access to and are within…

Cryptography and Security · Computer Science 2021-09-01 Mohammad Mamun , Kevin Shi

Persistent Backdoor Attacks in Continual Learning

Backdoor attacks pose a significant threat to neural networks, enabling adversaries to manipulate model outputs on specific inputs, often with devastating consequences, especially in critical applications. While backdoor attacks have been…

Machine Learning · Computer Science 2025-07-30 Zhen Guo , Abhinav Kumar , Reza Tourani

Backdooring Outlier Detection Methods: A Novel Attack Approach

There have been several efforts in backdoor attacks, but these have primarily focused on the closed-set performance of classifiers (i.e., classification). This has left a gap in addressing the threat to classifiers' open-set performance,…

Machine Learning · Computer Science 2024-12-09 ZeinabSadat Taghavi , Hossein Mirzaei

LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors

Prompt-tuning has emerged as an attractive paradigm for deploying large-scale language models due to its strong downstream task performance and efficient multitask serving ability. Despite its wide adoption, we empirically show that…

Computation and Language · Computer Science 2023-10-17 Chengkun Wei , Wenlong Meng , Zhikun Zhang , Min Chen , Minghu Zhao , Wenjing Fang , Lei Wang , Zihui Zhang , Wenzhi Chen

Task Agnostic and Post-hoc Unseen Distribution Detection

Despite the recent advances in out-of-distribution(OOD) detection, anomaly detection, and uncertainty estimation tasks, there do not exist a task-agnostic and post-hoc approach. To address this limitation, we design a novel clustering-based…

Machine Learning · Computer Science 2022-07-27 Radhika Dua , Seongjun Yang , Yixuan Li , Edward Choi

CEPA: Consensus Embedded Perturbation for Agnostic Detection and Inversion of Backdoors

A variety of defenses have been proposed against Trojans planted in (backdoor attacks on) deep neural network (DNN) classifiers. Backdoor-agnostic methods seek to reliably detect and/or to mitigate backdoors irrespective of the…

Cryptography and Security · Computer Science 2025-03-10 Guangmingmei Yang , Xi Li , Hang Wang , David J. Miller , George Kesidis

IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks

Backdoor attacks are an insidious security threat against machine learning models. Adversaries can manipulate the predictions of compromised models by inserting triggers into the training phase. Various backdoor attacks have been devised…

Computation and Language · Computer Science 2023-05-29 Xuanli He , Jun Wang , Benjamin Rubinstein , Trevor Cohn

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned…

Computation and Language · Computer Science 2021-06-04 Fanchao Qi , Mukai Li , Yangyi Chen , Zhengyan Zhang , Zhiyuan Liu , Yasheng Wang , Maosong Sun

Natural Backdoor Attack on Text Data

Recently, advanced NLP models have seen a surge in the usage of various applications. This raises the security threats of the released models. In addition to the clean models' unintentional weaknesses, {\em i.e.,} adversarial attacks, the…

Computation and Language · Computer Science 2021-01-18 Lichao Sun

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

Adversarial attacks and backdoor attacks are two common security threats that hang over deep learning. Both of them harness task-irrelevant features of data in their implementation. Text style is a feature that is naturally irrelevant to…

Computation and Language · Computer Science 2021-10-15 Fanchao Qi , Yangyi Chen , Xurui Zhang , Mukai Li , Zhiyuan Liu , Maosong Sun

TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks

Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers implant hidden triggers during training to maliciously control model behavior. Topological Evolution Dynamics (TED) has recently emerged as a powerful tool for…

Cryptography and Security · Computer Science 2025-06-13 Xiaoxing Mo , Yuxuan Cheng , Nan Sun , Leo Yu Zhang , Wei Luo , Shang Gao

BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting

Deep neural networks (DNNs) are susceptible to backdoor attacks, where malicious functionality is embedded to allow attackers to trigger incorrect classifications. Old-school backdoor attacks use strong trigger features that can easily be…

Cryptography and Security · Computer Science 2024-04-26 Huming Qiu , Junjie Sun , Mi Zhang , Xudong Pan , Min Yang