Related papers: For self-supervised learning, Rationality implies …

Generalization by Recognizing Confusion

A recently-proposed technique called self-adaptive training augments modern neural networks by allowing them to adjust training labels on the fly, to avoid overfitting to samples that may be mislabeled or otherwise non-representative. By…

Machine Learning · Computer Science 2020-06-16 Daniel Chiu , Franklyn Wang , Scott Duke Kominers

Improving Generalization by Controlling Label-Noise Information in Neural Network Weights

In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but…

Machine Learning · Computer Science 2020-11-23 Hrayr Harutyunyan , Kyle Reing , Greg Ver Steeg , Aram Galstyan

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee

Over-parameterized deep neural networks trained by simple first-order methods are known to be able to fit any labeling of data. Such over-fitting ability hinders generalization when mislabeled training examples are present. On the other…

Machine Learning · Computer Science 2020-10-06 Wei Hu , Zhiyuan Li , Dingli Yu

RATT: Leveraging Unlabeled Data to Guarantee Generalization

To assess generalization, machine learning scientists typically either (i) bound the generalization gap and then (after training) plug in the empirical risk to obtain a bound on the true risk; or (ii) validate empirically on holdout data.…

Machine Learning · Computer Science 2021-11-09 Saurabh Garg , Sivaraman Balakrishnan , J. Zico Kolter , Zachary C. Lipton

Labels, Information, and Computation: Efficient Learning Using Sufficient Labels

In supervised learning, obtaining a large set of fully-labeled training data is expensive. We show that we do not always need full label information on every single training example to train a competent classifier. Specifically, inspired by…

Machine Learning · Computer Science 2023-01-18 Shiyu Duan , Spencer Chang , Jose C. Principe

Functional Regularization for Representation Learning: A Unified Theoretical Perspective

Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their…

Machine Learning · Computer Science 2020-10-23 Siddhant Garg , Yingyu Liang

Distilling Effective Supervision from Severe Label Noise

Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer a lot from label noise. This paper targets at…

Machine Learning · Computer Science 2020-06-16 Zizhao Zhang , Han Zhang , Sercan O. Arik , Honglak Lee , Tomas Pfister

A soft nearest-neighbor framework for continual semi-supervised learning

Despite significant advances, the performance of state-of-the-art continual learning approaches hinges on the unrealistic scenario of fully labeled data. In this paper, we tackle this challenge and propose an approach for continual…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Zhiqi Kang , Enrico Fini , Moin Nabi , Elisa Ricci , Karteek Alahari

Robust Learning Under Label Noise With Iterative Noise-Filtering

We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to…

Machine Learning · Computer Science 2019-06-04 Duc Tam Nguyen , Thi-Phuong-Nhung Ngo , Zhongyu Lou , Michael Klar , Laura Beggel , Thomas Brox

Gaussian Robust Classification

Supervised learning is all about the ability to generalize knowledge. Specifically, the goal of the learning is to train a classifier using training data, in such a way that it will be capable of classifying new unseen data correctly. In…

Machine Learning · Computer Science 2011-04-04 Ido Ginodi , Amir Globerson

Understanding deep learning requires rethinking generalization

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the…

Machine Learning · Computer Science 2017-02-28 Chiyuan Zhang , Samy Bengio , Moritz Hardt , Benjamin Recht , Oriol Vinyals

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets…

Machine Learning · Computer Science 2018-11-16 Matthew Klawonn , Eric Heim , James Hendler

Arbitrarily Large Labelled Random Satisfiability Formulas for Machine Learning Training

Applying deep learning to solve real-life instances of hard combinatorial problems has tremendous potential. Research in this direction has focused on the Boolean satisfiability (SAT) problem, both because of its theoretical centrality and…

Artificial Intelligence · Computer Science 2023-06-06 Dimitris Achlioptas , Amrit Daswaney , Periklis A. Papakonstantinou

Training Classifiers that are Universally Robust to All Label Noise Levels

For classification tasks, deep neural networks are prone to overfitting in the presence of label noise. Although existing methods are able to alleviate this problem at low noise levels, they encounter significant performance reduction at…

Machine Learning · Computer Science 2021-05-31 Jingyi Xu , Tony Q. S. Quek , Kai Fong Ernest Chong

Semi-supervised Data Representation via Affinity Graph Learning

We consider the general problem of utilizing both labeled and unlabeled data to improve data representation performance. A new semi-supervised learning framework is proposed by combing manifold regularization and data representation methods…

Machine Learning · Computer Science 2015-02-16 Weiya Ren

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models

Recent advances in large language models have demonstrated the promise of unsupervised reinforcement learning (RL) methods for enhancing reasoning capabilities without external supervision. However, the generalizability of these label-free…

Machine Learning · Computer Science 2025-11-10 Shuvendu Roy , Hossein Hajimirsadeghi , Mengyao Zhai , Golnoosh Samei

Unlabeled Data Improves Adversarial Robustness

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap…

Machine Learning · Statistics 2022-01-14 Yair Carmon , Aditi Raghunathan , Ludwig Schmidt , Percy Liang , John C. Duchi

Label-Free Supervision of Neural Networks with Physics and Domain Knowledge

In many machine learning applications, labeled data is scarce and obtaining more labels is expensive. We introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than…

Artificial Intelligence · Computer Science 2016-09-20 Russell Stewart , Stefano Ermon

To understand deep learning we need to understand kernel learning

Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, typically over-parametrized, tend to fit the training data exactly. Despite this "overfitting", they perform well on…

Machine Learning · Statistics 2018-06-18 Mikhail Belkin , Siyuan Ma , Soumik Mandal

Minimum Description Length and Generalization Guarantees for Representation Learning

A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has…

Machine Learning · Statistics 2024-02-06 Milad Sefidgaran , Abdellatif Zaidi , Piotr Krasnowski