Related papers: Binary classification with corrupted labels

Learning from Multiple Corrupted Sources, with Application to Learning from Label Proportions

We study binary classification in the setting where the learner is presented with multiple corrupted training samples, with possibly different sample sizes and degrees of corruption, and introduce an approach based on minimizing a weighted…

Machine Learning · Statistics 2019-10-11 Clayton Scott , Jianxin Zhang

Corruption Robust Active Learning

We conduct theoretical studies on streaming-based active learning for binary classification under unknown adversarial label corruptions. In this setting, every time before the learner observes a sample, the adversary decides whether to…

Machine Learning · Computer Science 2021-06-22 Yifang Chen , Simon S. Du , Kevin Jamieson

Learning in the Presence of Corruption

In supervised learning one wishes to identify a pattern present in a joint distribution $P$, of instances, label pairs, by providing a function $f$ from instances to labels that has low risk $\mathbb{E}_{P}\ell(y,f(x))$. To do so, the…

Machine Learning · Statistics 2015-07-07 Brendan van Rooyen , Robert C. Williamson

Detecting Corrupted Labels Without Training a Model to Predict

Label noise in real-world datasets encodes wrong correlation patterns and impairs the generalization of deep neural networks (DNNs). It is critical to find efficient ways to detect corrupted patterns. Current methods primarily focus on…

Machine Learning · Computer Science 2022-06-22 Zhaowei Zhu , Zihao Dong , Yang Liu

Corruptions of Supervised Learning Problems: Typology and Mitigations

Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and…

Machine Learning · Computer Science 2026-05-19 Laura Iacovissi , Nan Lu , Robert C. Williamson

Practical estimation of the optimal classification error with soft labels and calibration

While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides…

Machine Learning · Computer Science 2026-05-13 Ryota Ushio , Takashi Ishida , Masashi Sugiyama

Hard Samples, Bad Labels: Robust Loss Functions That Know When to Back Off

Incorrectly labelled training data are frustratingly ubiquitous in both benchmark and specially curated datasets. Such mislabelling clearly adversely affects the performance and generalizability of models trained through supervised learning…

Machine Learning · Computer Science 2025-11-27 Nicholas Pellegrino , David Szczecina , Paul Fieguth

Calibration improves detection of mislabeled examples

Mislabeled data is a pervasive issue that undermines the performance of machine learning systems in real-world applications. An effective approach to mitigate this problem is to detect mislabeled instances and subject them to special…

Machine Learning · Computer Science 2025-11-05 Ilies Chibane , Thomas George , Pierre Nodet , Vincent Lemaire

Detecting Mislabeled and Corrupted Data via Pointwise Mutual Information

Deep neural networks can memorize corrupted labels, making data quality critical for model performance, yet real-world datasets are frequently compromised by both label noise and input noise. This paper proposes a mutual information-based…

Machine Learning · Computer Science 2025-08-12 Jinghan Yang , Jiayu Weng

Robust Conformal Outlier Detection under Contaminated Reference Data

Conformal prediction is a flexible framework for calibrating machine learning predictions, providing distribution-free statistical guarantees. In outlier detection, this calibration relies on a reference set of labeled inlier data to…

Machine Learning · Statistics 2025-06-17 Meshi Bashari , Matteo Sesia , Yaniv Romano

Learning with Monotone Adversarial Corruptions

We study the extent to which standard machine learning algorithms rely on exchangeability and independence of data by introducing a monotone adversarial corruption model. In this model, an adversary, upon looking at a "clean" i.i.d.…

Machine Learning · Computer Science 2026-01-06 Kasper Green Larsen , Chirag Pabbaraju , Abhishek Shetty

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by…

Machine Learning · Computer Science 2019-01-30 Dan Hendrycks , Mantas Mazeika , Duncan Wilson , Kevin Gimpel

Optimal Robust Estimation under Local and Global Corruptions: Stronger Adversary and Smaller Error

Algorithmic robust statistics has traditionally focused on the contamination model where a small fraction of the samples are arbitrarily corrupted. We consider a recent contamination model that combines two kinds of corruptions: (i) small…

Data Structures and Algorithms · Computer Science 2024-10-23 Thanasis Pittas , Ankit Pensia

Classification with imperfect training labels

We study the effect of imperfect training data labels on the performance of classification methods. In a general setting, where the probability that an observation in the training dataset is mislabelled may depend on both the feature vector…

Statistics Theory · Mathematics 2019-05-07 Timothy I. Cannings , Yingying Fan , Richard J. Samworth

Fair Classification with Group-Dependent Label Noise

This work examines how to train fair classifiers in settings where training labels are corrupted with random noise, and where the error rates of corruption depend both on the label class and on the membership function for a protected…

Machine Learning · Computer Science 2021-02-18 Jialu Wang , Yang Liu , Caleb Levy

Classification with Noisy Labels by Importance Reweighting

In this paper, we study a classification problem in which sample labels are randomly corrupted. In this scenario, there is an unobservable sample with noise-free labels. However, before being observed, the true labels are independently…

Machine Learning · Statistics 2015-07-21 Tongliang Liu , Dacheng Tao

Learning with Bad Training Data via Iterative Trimmed Loss Minimization

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the…

Machine Learning · Computer Science 2019-02-20 Yanyao Shen , Sujay Sanghavi

Robust Training under Label Noise by Over-parameterization

Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known…

Machine Learning · Computer Science 2022-08-04 Sheng Liu , Zhihui Zhu , Qing Qu , Chong You

Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models

We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in…

Machine Learning · Computer Science 2022-10-25 Pranjal Awasthi , Abhimanyu Das , Weihao Kong , Rajat Sen

Learning from Complementary Labels

Collecting labeled data is costly and thus a critical bottleneck in real-world classification tasks. To mitigate this problem, we propose a novel setting, namely learning from complementary labels for multi-class classification. A…

Machine Learning · Statistics 2017-11-15 Takashi Ishida , Gang Niu , Weihua Hu , Masashi Sugiyama