Related papers: Influence-Driven Data Poisoning in Graph-Based Sem…

Poisoning the Unlabeled Dataset of Semi-Supervised Learning

Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training,…

Machine Learning · Computer Science 2021-08-11 Nicholas Carlini

Poisoning Semi-supervised Federated Learning via Unlabeled Data: Attacks and Defenses

Semi-supervised Federated Learning (SSFL) has recently drawn much attention due to its practical consideration, i.e., the clients may only have unlabeled data. In practice, these SSFL systems implement semi-supervised training by assigning…

Machine Learning · Computer Science 2022-05-10 Yi Liu , Xingliang Yuan , Ruihui Zhao , Cong Wang , Dusit Niyato , Yefeng Zheng

A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

In this paper, we proposed a general framework for data poisoning attacks to graph-based semi-supervised learning (G-SSL). In this framework, we first unify different tasks, goals, and constraints into a single formula for data poisoning…

Machine Learning · Computer Science 2019-11-01 Xuanqing Liu , Si Si , Xiaojin Zhu , Yang Li , Cho-Jui Hsieh

Rethinking Backdoor Data Poisoning Attacks in the Context of Semi-Supervised Learning

Semi-supervised learning methods can train high-accuracy machine learning models with a fraction of the labeled training samples required for traditional supervised learning. Such methods do not typically involve close review of the…

Machine Learning · Computer Science 2022-12-07 Marissa Connor , Vincent Emanuele

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected)…

Cryptography and Security · Computer Science 2022-11-02 Virat Shejwalkar , Lingjuan Lyu , Amir Houmansadr

Gradient-based Data Subversion Attack Against Binary Classifiers

Machine learning based data-driven technologies have shown impressive performances in a variety of application domains. Most enterprises use data from multiple sources to provide quality applications. The reliability of the external data…

Machine Learning · Computer Science 2021-06-01 Rosni K Vasu , Sanjay Seetharaman , Shubham Malaviya , Manish Shukla , Sachin Lodha

Graph-based Semi-supervised Learning: A Comprehensive Review

Semi-supervised learning (SSL) has tremendous value in practice due to its ability to utilize both labeled data and unlabelled data. An important class of SSL methods is to naturally represent data as graphs such that the label information…

Machine Learning · Computer Science 2021-03-01 Zixing Song , Xiangli Yang , Zenglin Xu , Irwin King

Label Sanitization against Label Flipping Poisoning Attacks

Many machine learning systems rely on data collected in the wild from untrusted sources, exposing the learning algorithms to data poisoning. Attackers can inject malicious data in the training dataset to subvert the learning process,…

Machine Learning · Statistics 2018-10-04 Andrea Paudice , Luis Muñoz-González , Emil C. Lupu

Data driven semi-supervised learning

We consider a novel data driven approach for designing learning algorithms that can effectively learn with only a small number of labeled examples. This is crucial for modern machine learning applications where labels are scarce or…

Machine Learning · Computer Science 2021-10-01 Maria-Florina Balcan , Dravyansh Sharma

Analysis of Label-Flip Poisoning Attack on Machine Learning Based Malware Detector

With the increase in machine learning (ML) applications in different domains, incentives for deceiving these models have reached more than ever. As data is the core backbone of ML algorithms, attackers shifted their interest toward…

Cryptography and Security · Computer Science 2023-01-04 Kshitiz Aryal , Maanak Gupta , Mahmoud Abdelsalam

Amplifying Membership Exposure via Data Poisoning

As in-the-wild data are increasingly involved in the training stage, machine learning applications become more susceptible to data poisoning attacks. Such attacks typically lead to test-time accuracy degradation or controlled misprediction.…

Cryptography and Security · Computer Science 2022-11-02 Yufei Chen , Chao Shen , Yun Shen , Cong Wang , Yang Zhang

Semi-Supervised Learning under General Causal Models

Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could…

Machine Learning · Statistics 2025-10-28 Archer Moore , Heejung Shim , Jingge Zhu , Mingming Gong

Are labels informative in semi-supervised learning? -- Estimating and leveraging the missing-data mechanism

Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models, but it can be affected by the presence of ``informative'' labels, which occur when some classes are more likely to be labeled…

Machine Learning · Statistics 2023-02-16 Aude Sportisse , Hugo Schmutz , Olivier Humbert , Charles Bouveyron , Pierre-Alexandre Mattei

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks…

Machine Learning · Computer Science 2018-11-13 Ali Shafahi , W. Ronny Huang , Mahyar Najibi , Octavian Suciu , Christoph Studer , Tudor Dumitras , Tom Goldstein

Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning

Self-supervised learning (SSL) has revolutionized learning from large-scale unlabeled datasets, yet the intrinsic relationship between pretraining data and the learned representations remains poorly understood. Traditional supervised…

Machine Learning · Computer Science 2024-12-24 Nidhin Harilal , Amit Kiran Rege , Reza Akbarian Bafghi , Maziar Raissi , Claire Monteleoni

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study…

Machine Learning · Computer Science 2020-10-30 Zhongzheng Ren , Raymond A. Yeh , Alexander G. Schwing

Informative missingness and its implications in semi-supervised learning

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance…

Machine Learning · Statistics 2025-12-29 Jinran Wu , You-Gan Wang , Geoffrey J. McLachlan

Poisoning Attacks against Data-Driven Control Methods

This paper investigates poisoning attacks against data-driven control methods. This work is motivated by recent trends showing that, in supervised learning, slightly modifying the data in a malicious manner can drastically deteriorate the…

Systems and Control · Electrical Eng. & Systems 2021-03-11 Alessio Russo , Alexandre Proutiere

FlexSSL : A Generic and Efficient Framework for Semi-Supervised Learning

Semi-supervised learning holds great promise for many real-world applications, due to its ability to leverage both unlabeled and expensive labeled data. However, most semi-supervised learning algorithms still heavily rely on the limited…

Machine Learning · Computer Science 2023-12-29 Huiling Qin , Xianyuan Zhan , Yuanxun Li , Yu Zheng

Unsupervised Selective Labeling for More Effective Semi-Supervised Learning

Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right…

Machine Learning · Computer Science 2023-08-24 Xudong Wang , Long Lian , Stella X. Yu