Related papers: High Dimensional Binary Classification under Label…

Regularized Learning for Domain Adaptation under Label Shifts

We propose Regularized Learning under Label shifts (RLLS), a principled and a practical domain-adaptation algorithm to correct for shifts in the label distribution between a source and a target domain. We first estimate importance weights…

Machine Learning · Computer Science 2020-08-10 Kamyar Azizzadenesheli , Anqi Liu , Fanny Yang , Animashree Anandkumar

Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting

Transformers serve as the foundational architecture for many successful large-scale models, demonstrating the ability to overfit the training data while maintaining strong generalization on unseen data, a phenomenon known as benign…

Machine Learning · Computer Science 2025-02-19 Yingying Zhang , Zhenyu Wu , Jian Li , Yong Liu

Multi-class Classification from Multiple Unlabeled Datasets with Partial Risk Regularization

Recent years have witnessed a great success of supervised deep learning, where predictive models were trained from a large amount of fully labeled data. However, in practice, labeling such big data can be very costly and may not even be…

Machine Learning · Computer Science 2022-10-18 Yuting Tang , Nan Lu , Tianyi Zhang , Masashi Sugiyama

Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution…

Image and Video Processing · Electrical Eng. & Systems 2022-07-12 Wenao Ma , Cheng Chen , Shuang Zheng , Jing Qin , Huimao Zhang , Qi Dou

Coping with Label Shift via Distributionally Robust Optimisation

The label shift problem refers to the supervised learning setting where the train and test label distributions do not match. Existing work addressing label shift usually assumes access to an \emph{unlabelled} test sample. This sample may be…

Machine Learning · Computer Science 2021-08-18 Jingzhao Zhang , Aditya Menon , Andreas Veit , Srinadh Bhojanapalli , Sanjiv Kumar , Suvrit Sra

One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization

We study the use of linear regression for multiclass classification in the over-parametrized regime where some of the training data is mislabeled. In such scenarios it is necessary to add an explicit regularization term, $\lambda f(w)$, for…

Machine Learning · Computer Science 2024-10-14 Reza Ghane , Danil Akhtiamov , Babak Hassibi

Sequential changepoint detection in classification data under label shift

Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We…

Methodology · Statistics 2021-09-01 Ciaran Evans , Max G'Sell

Regularization via Structural Label Smoothing

Regularization is an effective way to promote the generalization performance of machine learning models. In this paper, we focus on label smoothing, a form of output distribution regularization that prevents overfitting of a neural network…

Machine Learning · Computer Science 2020-07-07 Weizhi Li , Gautam Dasarathy , Visar Berisha

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models

Studies on benign overfitting provide insights for the success of overparameterized deep learning models. In this work, we examine whether overfitting is truly benign in real-world classification tasks. We start with the observation that a…

Machine Learning · Computer Science 2023-04-04 Kaiyue Wen , Jiaye Teng , Jingzhao Zhang

On Regularization and Inference with Label Constraints

Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a…

Machine Learning · Computer Science 2023-07-11 Kaifu Wang , Hangfeng He , Tin D. Nguyen , Piyush Kumar , Dan Roth

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation

Label shift refers to the phenomenon where the prior class probability p(y) changes between the training and test distributions, while the conditional probability p(x|y) stays fixed. Label shift arises in settings like medical diagnosis,…

Machine Learning · Computer Science 2020-06-30 Amr Alexandari , Anshul Kundaje , Avanti Shrikumar

Synthetic Tabular Data Generation for Imbalanced Classification: The Surprising Effectiveness of an Overlap Class

Handling imbalance in class distribution when building a classifier over tabular data has been a problem of long-standing interest. One popular approach is augmenting the training dataset with synthetically generated data. While classical…

Machine Learning · Computer Science 2025-02-20 Annie D'souza , Swetha M , Sunita Sarawagi

Regularized Linear Regression for Binary Classification

Regularized linear regression is a promising approach for binary classification problems in which the training set has noisy labels since the regularization term can help to avoid interpolating the mislabeled data points. In this paper we…

Machine Learning · Computer Science 2023-11-07 Danil Akhtiamov , Reza Ghane , Babak Hassibi

Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts

In many real-world pattern recognition scenarios, such as in medical applications, the corresponding classification tasks can be of an imbalanced nature. In the current study, we focus on binary, imbalanced classification tasks, i.e.~binary…

Machine Learning · Computer Science 2020-12-01 Peter Bellmann , Heinke Hihn , Daniel A. Braun , Friedhelm Schwenker

Benign Overfitting in Linear Classifiers with a Bias Term

Modern machine learning models with a large number of parameters often generalize well despite perfectly interpolating noisy training data - a phenomenon known as benign overfitting. A foundational explanation for this in linear…

Machine Learning · Statistics 2025-11-18 Yuta Kondo

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular…

Machine Learning · Computer Science 2023-06-21 Niladri S. Chatterji , Saminul Haque , Tatsunori Hashimoto

Label Alignment Regularization for Distribution Shift

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix. Drawing inspiration from this…

Machine Learning · Computer Science 2024-09-12 Ehsan Imani , Guojun Zhang , Runjia Li , Jun Luo , Pascal Poupart , Philip H. S. Torr , Yangchen Pan

Rethinking the Value of Labels for Improving Class-Imbalanced Learning

Real-world data often exhibits long-tailed distributions with heavy class imbalance, posing great challenges for deep recognition models. We identify a persisting dilemma on the value of labels in the context of imbalanced learning: on the…

Machine Learning · Computer Science 2020-09-29 Yuzhe Yang , Zhi Xu

Model Debiasing by Learnable Data Augmentation

Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…

Machine Learning · Computer Science 2024-08-12 Pietro Morerio , Ruggero Ragonesi , Vittorio Murino

Binary Quantification and Dataset Shift: An Experimental Investigation

Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the…

Machine Learning · Computer Science 2023-10-10 Pablo González , Alejandro Moreo , Fabrizio Sebastiani