Related papers: Stealthy Backdoor Attack for Code Models

Backdoor Attack with Invisible Triggers Based on Model Architecture Modification

Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific…

Cryptography and Security · Computer Science 2025-09-24 Yuan Ma , Jiankang Wei , Yilun Lyu , Kehao Chen , Jingtong Huang

Stealthy Backdoors as Compression Artifacts

In a backdoor attack on a machine learning model, an adversary produces a model that performs well on normal inputs but outputs targeted misclassifications on inputs containing a small trigger pattern. Model compression is a widely-used…

Cryptography and Security · Computer Science 2021-05-03 Yulong Tian , Fnu Suya , Fengyuan Xu , David Evans

Variance-Based Defense Against Blended Backdoor Attacks

Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models, primarily due to their stealthy nature. The model behaves normally on clean data but exhibits malicious behavior only when the attacker embeds a…

Machine Learning · Computer Science 2025-09-29 Sujeevan Aseervatham , Achraf Kerzazi , Younès Bennani

On Provable Backdoor Defense in Collaborative Learning

As collaborative learning allows joint training of a model using multiple sources of data, the security problem has been a central concern. Malicious users can upload poisoned data to prevent the model's convergence or inject hidden…

Cryptography and Security · Computer Science 2021-01-21 Ximing Qiao , Yuhua Bai , Siping Hu , Ang Li , Yiran Chen , Hai Li

Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation

Neural code models have been increasingly incorporated into software development processes. However, their susceptibility to backdoor attacks presents a significant security risk. The state-of-the-art understanding focuses on…

Software Engineering · Computer Science 2025-12-23 Junyao Ye , Zhen Li , Xi Tang , Shouhuai Xu , Deqing Zou , Zhongsheng Yuan

Towards Backdoor Stealthiness in Model Parameter Space

Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces.…

Cryptography and Security · Computer Science 2025-12-15 Xiaoyun Xu , Zhuoran Liu , Stefanos Koffas , Stjepan Picek

Backdoor Pre-trained Models Can Transfer to All

Pre-trained general-purpose language models have been a dominating component in enabling real-world natural language processing (NLP) applications. However, a pre-trained model with backdoor can be a severe threat to the applications. Most…

Computation and Language · Computer Science 2021-11-02 Lujia Shen , Shouling Ji , Xuhong Zhang , Jinfeng Li , Jing Chen , Jie Shi , Chengfang Fang , Jianwei Yin , Ting Wang

Architectural Backdoors in Neural Networks

Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to…

Machine Learning · Computer Science 2022-06-17 Mikel Bober-Irizar , Ilia Shumailov , Yiren Zhao , Robert Mullins , Nicolas Papernot

Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies

The widespread adoption of deep learning across various industries has introduced substantial challenges, particularly in terms of model explainability and security. The inherent complexity of deep learning models, while contributing to…

Cryptography and Security · Computer Science 2025-01-08 Kealan Dunnett , Reza Arablouei , Dimity Miller , Volkan Dedeoglu , Raja Jurdak

Can We Mitigate Backdoor Attack Using Adversarial Detection Methods?

Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input are able to mislead the models to give wrong results. Although defenses against adversarial attacks…

Machine Learning · Computer Science 2022-08-01 Kaidi Jin , Tianwei Zhang , Chao Shen , Yufei Chen , Ming Fan , Chenhao Lin , Ting Liu

DECK: Model Hardening for Defending Pervasive Backdoors

Pervasive backdoors are triggered by dynamic and pervasive input perturbations. They can be intentionally injected by attackers or naturally exist in normally trained models. They have a different nature from the traditional static and…

Cryptography and Security · Computer Science 2022-06-22 Guanhong Tao , Yingqi Liu , Siyuan Cheng , Shengwei An , Zhuo Zhang , Qiuling Xu , Guangyu Shen , Xiangyu Zhang

Bypassing Backdoor Detection Algorithms in Deep Learning

Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model,…

Machine Learning · Computer Science 2020-06-09 Te Juin Lester Tan , Reza Shokri

ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs

Large Language Models (LLMs) have become integral to many applications, with system prompts serving as a key mechanism to regulate model behavior and ensure ethical outputs. In this paper, we introduce a novel backdoor attack that…

Cryptography and Security · Computer Science 2024-10-08 Lu Yan , Siyuan Cheng , Xuan Chen , Kaiyuan Zhang , Guangyu Shen , Zhuo Zhang , Xiangyu Zhang

CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models

Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation. However, these models operate in the…

Computation and Language · Computer Science 2023-04-20 Akshita Jha , Chandan K. Reddy

Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing

Large pre-trained models have achieved notable success across a range of downstream tasks. However, recent research shows that a type of adversarial attack ($\textit{i.e.,}$ backdoor attack) can manipulate the behavior of machine learning…

Artificial Intelligence · Computer Science 2024-10-29 Dongliang Guo , Mengxuan Hu , Zihan Guan , Junfeng Guo , Thomas Hartvigsen , Sheng Li

Backdooring Neural Code Search

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural…

Software Engineering · Computer Science 2023-06-13 Weisong Sun , Yuchen Chen , Guanhong Tao , Chunrong Fang , Xiangyu Zhang , Quanjun Zhang , Bin Luo

Adversarial Attacks on Code Models with Discriminative Graph Patterns

Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One…

Software Engineering · Computer Science 2024-11-01 Thanh-Dat Nguyen , Yang Zhou , Xuan Bach D. Le , Patanamon Thongtanunam , David Lo

Hidden Trigger Backdoor Attacks

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

Mitigating Backdoor Attacks using Activation-Guided Model Editing

Backdoor attacks compromise the integrity and reliability of machine learning models by embedding a hidden trigger during the training process, which can later be activated to cause unintended misbehavior. We propose a novel backdoor…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Felix Hsieh , Huy H. Nguyen , AprilPyone MaungMaung , Dmitrii Usynin , Isao Echizen

Backdoor Attacks for In-Context Learning with Language Models

Because state-of-the-art language models are expensive to train, most practitioners must make use of one of the few publicly available language models or language model APIs. This consolidation of trust increases the potency of backdoor…

Cryptography and Security · Computer Science 2023-07-28 Nikhil Kandpal , Matthew Jagielski , Florian Tramèr , Nicholas Carlini