Related papers: Interpolation Learning With Minimum Description Le…
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of…
To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes,…
We examine the necessity of interpolation in overparameterized models, that is, when achieving optimal predictive risk in machine learning problems requires (nearly) interpolating the training data. In particular, we consider simple…
We provide matching upper and lower bounds of order $\sigma^2/\log(d/n)$ for the prediction error of the minimum $\ell_1$-norm interpolator, a.k.a. basis pursuit. Our result is tight up to negligible terms when $d \gg n$, and is the first…
A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has…
Understanding how overparameterized neural networks generalize despite perfect interpolation of noisy training data is a fundamental question. Mallinar et. al. 2022 noted that neural networks seem to often exhibit ``tempered overfitting'',…
In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the…
Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for "overfitted" / interpolated…
Learning with label dependent label noise has been extensively explored in both theory and practice; however, dealing with instance (i.e., feature) and label dependent label noise continues to be a challenging task. The difficulty arises…
An evolving line of machine learning works observe empirical evidence that suggests interpolating estimators -- the ones that achieve zero training error -- may not necessarily be harmful. This paper pursues theoretical understanding for an…
Collecting labeled data to train deep neural networks is costly and even impractical for many tasks. Thus, research effort has been focused in automatically curated datasets or unsupervised and weakly supervised learning. The common problem…
We consider the problem of training a model under the presence of label noise. Current approaches identify samples with potentially incorrect labels and reduce their influence on the learning process by either assigning lower weights to…
Learning with reduced labeling standards, such as noisy label, partial label, and multiple label candidates, which we generically refer to as \textit{imprecise} labels, is a commonplace challenge in machine learning tasks. Previous methods…
Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labeled data. Even though this is not…
In supervised classification tasks, models are trained to predict a label for each data point. In real-world datasets, these labels are often noisy due to annotation errors. While the impact of label noise on the performance of deep…
Semi-supervised learning (SSL) has long been proved to be an effective technique to construct powerful models with limited labels. In the existing literature, consistency regularization-based methods, which force the perturbed samples to…
We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples - the true label distribution is…
This paper addresses a regression problem in which output label values are the results of sensing the magnitude of a phenomenon. A low value of such labels can mean either that the actual magnitude of the phenomenon was low or that the…
It is shown that the two-part Minimum Description Length Principle can be used to discriminate among different models that can explain a given observed dataset. The description length is chosen to be the sum of the lengths of the message…
We study the effect of imperfect training data labels on the performance of classification methods. In a general setting, where the probability that an observation in the training dataset is mislabelled may depend on both the feature vector…