Related papers: Mudjacking: Patching Backdoor Vulnerabilities in F…

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification

Backdoor attack is a major threat to deep learning systems in safety-critical scenarios, which aims to trigger misbehavior of neural network models under attacker-controlled conditions. However, most backdoor attacks have to modify the…

Machine Learning · Computer Science 2023-08-24 Yizhen Yuan , Rui Kong , Shenghao Xie , Yuanchun Li , Yunxin Liu

Backdoor Unlearning by Linear Task Decomposition

Foundation models have revolutionized computer vision by enabling broad generalization across diverse tasks. Yet, they remain highly susceptible to adversarial perturbations and targeted backdoor attacks. Mitigating such vulnerabilities…

Machine Learning · Computer Science 2025-10-17 Amel Abdelraheem , Alessandro Favero , Gerome Bovet , Pascal Frossard

Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation

For nearly a decade the academic community has investigated backdoors in neural networks, primarily focusing on classification tasks where adversaries manipulate the model prediction. While demonstrably malicious, the immediate real-world…

Cryptography and Security · Computer Science 2026-03-24 Nicolas Küchler , Ivan Petrov , Conrad Grobler , Ilia Shumailov

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias

With the swift advancement of deep learning, state-of-the-art algorithms have been utilized in various social situations. Nonetheless, some algorithms have been discovered to exhibit biases and provide unequal results. The current debiasing…

Machine Learning · Computer Science 2024-07-02 Shangxi Wu , Qiuyang He , Jian Yu , Jitao Sang

Countering Backdoor Attacks in Image Recognition: A Survey and Evaluation of Mitigation Strategies

The widespread adoption of deep learning across various industries has introduced substantial challenges, particularly in terms of model explainability and security. The inherent complexity of deep learning models, while contributing to…

Cryptography and Security · Computer Science 2025-01-08 Kealan Dunnett , Reza Arablouei , Dimity Miller , Volkan Dedeoglu , Raja Jurdak

BadMerging: Backdoor Attacks Against Model Merging

Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective approach to facilitate knowledge transfer among these…

Cryptography and Security · Computer Science 2024-09-04 Jinghuai Zhang , Jianfeng Chi , Zheng Li , Kunlin Cai , Yang Zhang , Yuan Tian

Backdoor Attacks to Pre-trained Unified Foundation Models

The rise of pre-trained unified foundation models breaks down the barriers between different modalities and tasks, providing comprehensive support to users with unified architectures. However, the backdoor attack on pre-trained models poses…

Cryptography and Security · Computer Science 2023-02-27 Zenghui Yuan , Yixin Liu , Kai Zhang , Pan Zhou , Lichao Sun

Planting Undetectable Backdoors in Machine Learning Models

Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a…

Machine Learning · Computer Science 2024-11-12 Shafi Goldwasser , Michael P. Kim , Vinod Vaikuntanathan , Or Zamir

Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks

Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Alexander Unnervik , Hatef Otroshi Shahreza , Anjith George , Sébastien Marcel

Backdoor Attack with Invisible Triggers Based on Model Architecture Modification

Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific…

Cryptography and Security · Computer Science 2025-09-24 Yuan Ma , Jiankang Wei , Yilun Lyu , Kehao Chen , Jingtong Huang

UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models

Diffusion models are vulnerable to backdoor attacks, where malicious attackers inject backdoors by poisoning certain training samples during the training stage. This poses a significant threat to real-world applications in the…

Cryptography and Security · Computer Science 2025-02-05 Zihan Guan , Mengxuan Hu , Sheng Li , Anil Vullikanti

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

AI systems are rapidly advancing in capability, and frontier model developers broadly acknowledge the need for safeguards against serious misuse. However, this paper demonstrates that fine-tuning, whether via open weights or closed…

Cryptography and Security · Computer Science 2025-09-23 Brendan Murphy , Dillon Bowen , Shahrad Mohammadzadeh , Tom Tseng , Julius Broomfield , Adam Gleave , Kellin Pelrine

Unveiling Backdoor Risks Brought by Foundation Models in Heterogeneous Federated Learning

The foundation models (FMs) have been used to generate synthetic public datasets for the heterogeneous federated learning (HFL) problem where each client uses a unique model architecture. However, the vulnerabilities of integrating FMs,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-01 Xi Li , Chen Wu , Jiaqi Wang

Attacking Attention of Foundation Models Disrupts Downstream Tasks

Foundation models represent the most prominent and recent paradigm shift in artificial intelligence. Foundation models are large models, trained on broad data that deliver high accuracy in many downstream tasks, often without fine-tuning.…

Cryptography and Security · Computer Science 2025-09-15 Hondamunige Prasanna Silva , Federico Becattini , Lorenzo Seidenari

Backdoor Attacks for In-Context Learning with Language Models

Because state-of-the-art language models are expensive to train, most practitioners must make use of one of the few publicly available language models or language model APIs. This consolidation of trust increases the potency of backdoor…

Cryptography and Security · Computer Science 2023-07-28 Nikhil Kandpal , Matthew Jagielski , Florian Tramèr , Nicholas Carlini

Backdoor Threats from Compromised Foundation Models to Federated Learning

Federated learning (FL) represents a novel paradigm to machine learning, addressing critical issues related to data privacy and security, yet suffering from data insufficiency and imbalance. The emergence of foundation models (FMs) provides…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-02 Xi Li , Songhe Wang , Chen Wu , Hao Zhou , Jiaqi Wang

Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks

In the rapidly evolving landscape of communication and network security, the increasing reliance on deep neural networks (DNNs) and cloud services for data processing presents a significant vulnerability: the potential for backdoors that…

Cryptography and Security · Computer Science 2024-03-14 Khondoker Murad Hossain , Tim Oates

DeBackdoor: A Deductive Framework for Detecting Backdoor Attacks on Deep Models with Limited Data

Backdoor attacks are among the most effective, practical, and stealthy attacks in deep learning. In this paper, we consider a practical scenario where a developer obtains a deep model from a third party and uses it as part of a…

Cryptography and Security · Computer Science 2025-03-28 Dorde Popovic , Amin Sadeghi , Ting Yu , Sanjay Chawla , Issa Khalil

Rethinking Backdoor Detection Evaluation for Language Models

Backdoor attacks, in which a model behaves maliciously when given an attacker-specified trigger, pose a major security risk for practitioners who depend on publicly released language models. As a countermeasure, backdoor detection methods…

Computation and Language · Computer Science 2025-09-23 Jun Yan , Wenjie Jacky Mo , Xiang Ren , Robin Jia

Unlearning Backdoor Attacks through Gradient-Based Model Pruning

In the era of increasing concerns over cybersecurity threats, defending against backdoor attacks is paramount in ensuring the integrity and reliability of machine learning models. However, many existing approaches require substantial…

Machine Learning · Computer Science 2024-05-08 Kealan Dunnett , Reza Arablouei , Dimity Miller , Volkan Dedeoglu , Raja Jurdak