Related papers: Show Me Your Code! Kill Code Poisoning: A Lightwei…

Eliminating Backdoors in Neural Code Models for Secure Code Understanding

Neural code models (NCMs) have been widely used to address various code understanding tasks, such as defect detection. However, numerous recent studies reveal that such models are vulnerable to backdoor attacks. Backdoored NCMs function…

Cryptography and Security · Computer Science 2025-02-21 Weisong Sun , Yuchen Chen , Chunrong Fang , Yebo Feng , Yuan Xiao , An Guo , Quanjun Zhang , Yang Liu , Baowen Xu , Zhenyu Chen

Detecting Data Poisoning in Code Generation LLMs via Black-Box, Vulnerability-Oriented Scanning

Code generation large language models (LLMs) are increasingly integrated into modern software development workflows. Recent work has shown that these models are vulnerable to backdoor and poisoning attacks that induce the generation of…

Cryptography and Security · Computer Science 2026-03-19 Shenao Yan , Shimaa Ahmed , Shan Jin , Sunpreet S. Arora , Yiwei Cai , Yizhen Wang , Yuan Hong

BadCS: A Backdoor Attack Framework for Code search

With the development of deep learning (DL), DL-based code search models have achieved state-of-the-art performance and have been widely used by developers during software development. However, the security issue, e.g., recommending…

Software Engineering · Computer Science 2023-05-10 Shiyi Qi , Yuanhang Yang , Shuzhzeng Gao , Cuiyun Gao , Zenglin Xu

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning

Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization,…

Cryptography and Security · Computer Science 2026-05-27 Zedian Shao , Charles Fleming , Teodora Baluta

Backdoors in Code Summarizers: How Bad Is It?

Code LLMs are increasingly employed in software development. However, studies have shown that they are vulnerable to backdoor attacks: when a trigger (a specific input pattern) appears in the input, the backdoor will be activated and cause…

Cryptography and Security · Computer Science 2025-10-07 Chenyu Wang , Zhou Yang , Yaniv Harel , David Lo

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and…

Cryptography and Security · Computer Science 2024-06-12 Shenao Yan , Shen Wang , Yue Duan , Hanbin Hong , Kiho Lee , Doowon Kim , Yuan Hong

Double Backdoored: Converting Code Large Language Model Backdoors to Traditional Malware via Adversarial Instruction Tuning Attacks

Instruction-tuned Large Language Models designed for coding tasks are increasingly employed as AI coding assistants. However, the cybersecurity vulnerabilities and implications arising from the widespread integration of these models are not…

Cryptography and Security · Computer Science 2025-03-10 Md Imran Hossen , Sai Venkatesh Chilukoti , Liqun Shan , Sheng Chen , Yinzhi Cao , Xiali Hei

Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models

Recent studies have revealed a security threat to natural language processing (NLP) models, called the Backdoor Attack. Victim models can maintain competitive performance on clean samples while behaving abnormally on samples with a specific…

Computation and Language · Computer Science 2021-03-30 Wenkai Yang , Lei Li , Zhiyuan Zhang , Xuancheng Ren , Xu Sun , Bin He

CodePurify: Defend Backdoor Attacks on Neural Code Models via Entropy-based Purification

Neural code models have found widespread success in tasks pertaining to code intelligence, yet they are vulnerable to backdoor attacks, where an adversary can manipulate the victim model's behavior by inserting triggers into the source…

Cryptography and Security · Computer Science 2024-10-29 Fangwen Mu , Junjie Wang , Zhuohao Yu , Lin Shi , Song Wang , Mingyang Li , Qing Wang

Backdooring Neural Code Search

Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural…

Software Engineering · Computer Science 2023-06-13 Weisong Sun , Yuchen Chen , Guanhong Tao , Chunrong Fang , Xiangyu Zhang , Quanjun Zhang , Bin Luo

Poison Attack and Defense on Deep Source Code Processing Models

In the software engineering community, deep learning (DL) has recently been applied to many source code processing tasks. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers…

Software Engineering · Computer Science 2022-11-01 Jia Li , Zhuo Li , Huangzhao Zhang , Ge Li , Zhi Jin , Xing Hu , Xin Xia

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub,…

Cryptography and Security · Computer Science 2024-02-12 Domenico Cotroneo , Cristina Improta , Pietro Liguori , Roberto Natella

BadEdit: Backdooring large language models by model editing

Mainstream backdoor attack methods typically demand substantial tuning data for poisoning, limiting their practicality and potentially degrading the overall performance when applied to Large Language Models (LLMs). To address these issues,…

Cryptography and Security · Computer Science 2024-03-21 Yanzhou Li , Tianlin Li , Kangjie Chen , Jian Zhang , Shangqing Liu , Wenhan Wang , Tianwei Zhang , Yang Liu

Detecting Stealthy Data Poisoning Attacks in AI Code Generators

Deep learning (DL) models for natural language-to-code generation have become integral to modern software development pipelines. However, their heavy reliance on large amounts of data, often collected from unsanitized online sources,…

Cryptography and Security · Computer Science 2025-09-01 Cristina Improta

ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs

Recent advancements have led to the widespread adoption of code-oriented large language models (Code LLMs) for programming tasks. Despite their success in deployment, their security research is left far behind. This paper introduces a new…

Cryptography and Security · Computer Science 2025-07-23 Yuchen Yang , Yiming Li , Hongwei Yao , Bingrun Yang , Yiling He , Tianwei Zhang , Dacheng Tao , Zhan Qin

Poisoning Programs by Un-Repairing Code: Security Concerns of AI-generated Code

AI-based code generators have gained a fundamental role in assisting developers in writing software starting from natural language (NL). However, since these large language models are trained on massive volumes of data collected from…

Cryptography and Security · Computer Science 2024-03-12 Cristina Improta

Systematic Testing of the Data-Poisoning Robustness of KNN

Data poisoning aims to compromise a machine learning based software component by contaminating its training set to change its prediction results for test inputs. Existing methods for deciding data-poisoning robustness have either poor…

Software Engineering · Computer Science 2023-07-18 Yannan Li , Jingbo Wang , Chao Wang

Reliable Poisoned Sample Detection against Backdoor Attacks Enhanced by Sharpness Aware Minimization

Backdoor attack has been considered as a serious security threat to deep neural networks (DNNs). Poisoned sample detection (PSD) that aims at filtering out poisoned samples from an untrustworthy training dataset has shown very promising…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Mingda Zhang , Mingli Zhu , Zihao Zhu , Baoyuan Wu

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output…

Machine Learning · Computer Science 2024-08-27 Alina Ermilova , Elizaveta Kovtun , Dmitry Berestnev , Alexey Zaytsev

Taint-Based Code Slicing for LLMs-based Malicious NPM Package Detection

Software supply chain attacks targeting the npm ecosystem have become increasingly sophisticated, leveraging obfuscation and complex logic to evade traditional detection mechanisms. Recently, large language models (LLMs) have attracted…

Cryptography and Security · Computer Science 2026-01-13 Dang-Khoa Nguyen , Gia-Thang Ho , Quang-Minh Pham , Tuyet A. Dang-Thi , Minh-Khanh Vu , Thanh-Cong Nguyen , Phat T. Tran-Truong , Duc-Ly Vu