Related papers: Backdooring Explainable Machine Learning

Hidden Trigger Backdoor Attacks

With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on…

Computer Vision and Pattern Recognition · Computer Science 2019-12-24 Aniruddha Saha , Akshayvarun Subramanya , Hamed Pirsiavash

Backdoor Attack with Invisible Triggers Based on Model Architecture Modification

Machine learning systems are vulnerable to backdoor attacks, where attackers manipulate model behavior through data tampering or architectural modifications. Traditional backdoor attacks involve injecting malicious samples with specific…

Cryptography and Security · Computer Science 2025-09-24 Yuan Ma , Jiankang Wei , Yilun Lyu , Kehao Chen , Jingtong Huang

Architectural Backdoors in Neural Networks

Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to…

Machine Learning · Computer Science 2022-06-17 Mikel Bober-Irizar , Ilia Shumailov , Yiren Zhao , Robert Mullins , Nicolas Papernot

Backdoor Learning: A Survey

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), so that the attacked models perform well on benign samples, whereas their predictions will be maliciously changed if the hidden backdoor is activated by…

Cryptography and Security · Computer Science 2022-02-17 Yiming Li , Yong Jiang , Zhifeng Li , Shu-Tao Xia

Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications…

Cryptography and Security · Computer Science 2018-08-31 Cong Liao , Haoti Zhong , Anna Squicciarini , Sencun Zhu , David Miller

Bypassing Backdoor Detection Algorithms in Deep Learning

Deep learning models are vulnerable to various adversarial manipulations of their training data, parameters, and input sample. In particular, an adversary can modify the training data and model parameters to embed backdoors into the model,…

Machine Learning · Computer Science 2020-06-09 Te Juin Lester Tan , Reza Shokri

Blind Backdoors in Deep Learning Models

We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than…

Cryptography and Security · Computer Science 2021-02-22 Eugene Bagdasaryan , Vitaly Shmatikov

Detecting and Eliminating Neural Network Backdoors Through Active Paths with Application to Intrusion Detection

Machine learning backdoors have the property that the machine learning model should work as expected on normal inputs, but when the input contains a specific $\textit{trigger}$, it behaves as the attacker desires. Detecting such triggers…

Cryptography and Security · Computer Science 2026-03-12 Eirik Høyheim , Magnus Wiik Eckhoff , Gudmund Grov , Robert Flood , David Aspinall

Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System

In recent years, the security issues of artificial intelligence have become increasingly prominent due to the rapid development of deep learning research and applications. Backdoor attack is an attack targeting the vulnerability of deep…

Cryptography and Security · Computer Science 2023-12-14 Peixin Zhang , Jun Sun , Mingtian Tan , Xinyu Wang

A Survey on Backdoor Attack and Defense in Natural Language Processing

Deep learning is becoming increasingly popular in real-life applications, especially in natural language processing (NLP). Users often choose training outsourcing or adopt third-party data and models due to data and computation resources…

Computation and Language · Computer Science 2022-11-23 Xuan Sheng , Zhaoyang Han , Piji Li , Xiangmao Chang

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial…

Machine Learning · Computer Science 2025-02-18 Jon Vadillo , Roberto Santana , Jose A. Lozano

Deep Learning Backdoors

Intuitively, a backdoor attack against Deep Neural Networks (DNNs) is to inject hidden malicious behaviors into DNNs such that the backdoor model behaves legitimately for benign inputs, yet invokes a predefined malicious behavior when its…

Cryptography and Security · Computer Science 2021-02-09 Shaofeng Li , Shiqing Ma , Minhui Xue , Benjamin Zi Hao Zhao

Label-Consistent Backdoor Attacks

Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained…

Machine Learning · Statistics 2019-12-10 Alexander Turner , Dimitris Tsipras , Aleksander Madry

Planting Undetectable Backdoors in Machine Learning Models

Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a…

Machine Learning · Computer Science 2024-11-12 Shafi Goldwasser , Michael P. Kim , Vinod Vaikuntanathan , Or Zamir

Demystifying Poisoning Backdoor Attacks from a Statistical Perspective

The growing dependence on machine learning in real-world applications emphasizes the importance of understanding and ensuring its safety. Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious…

Cryptography and Security · Computer Science 2023-10-19 Ganghua Wang , Xun Xian , Jayanth Srinivasa , Ashish Kundu , Xuan Bi , Mingyi Hong , Jie Ding

Mitigating Backdoor Attacks using Activation-Guided Model Editing

Backdoor attacks compromise the integrity and reliability of machine learning models by embedding a hidden trigger during the training process, which can later be activated to cause unintended misbehavior. We propose a novel backdoor…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Felix Hsieh , Huy H. Nguyen , AprilPyone MaungMaung , Dmitrii Usynin , Isao Echizen

Stealthy Backdoors as Compression Artifacts

In a backdoor attack on a machine learning model, an adversary produces a model that performs well on normal inputs but outputs targeted misclassifications on inputs containing a small trigger pattern. Model compression is a widely-used…

Cryptography and Security · Computer Science 2021-05-03 Yulong Tian , Fnu Suya , Fengyuan Xu , David Evans

When Forgetting Triggers Backdoors: A Clean Unlearning Attack

Machine unlearning has emerged as a key component in ensuring ``Right to be Forgotten'', enabling the removal of specific data points from trained models. However, even when the unlearning is performed without poisoning the forget-set…

Cryptography and Security · Computer Science 2025-06-17 Marco Arazzi , Antonino Nocera , Vinod P

Backdoor Attacks on Multi-modal Contrastive Learning

Contrastive learning has become a leading self- supervised approach to representation learning across domains, including vision, multimodal settings, graphs, and federated learning. However, recent studies have shown that contrastive…

Machine Learning · Computer Science 2026-01-19 Simi D Kuniyilh , Rita Machacy

Backdoor Attack against NLP models with Robustness-Aware Perturbation defense

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if the hidden backdoor is activated by the…

Cryptography and Security · Computer Science 2022-04-13 Shaik Mohammed Maqsood , Viveros Manuela Ceron , Addluri GowthamKrishna