Related papers: Improving Multi-class Classifier Using Likelihood …

A replica analysis of under-bagging

Under-bagging (UB), which combines under-sampling and bagging, is a popular ensemble learning method for training classifiers on an imbalanced data. Using bagging to reduce the increased variance caused by the reduction in sample size due…

Machine Learning · Statistics 2025-05-19 Takashi Takahashi

Unbiased Bregman-Risk Estimators: Application to Regularization Parameter Selection in Tomographic Image Reconstruction

Unbiased estimators are introduced for averaged Bregman divergences which generalize Stein's Unbiased (Predictive) Risk Estimator, and the minimization of these estimators is proposed as a regularization parameter selection method for…

Numerical Analysis · Mathematics 2021-11-22 Elias S. Helou , Sandra A. Santos , Lucas E. A. Simões

Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…

Methodology · Statistics 2023-11-02 Anvit Garg , Anil K. Ghosh , Soham Sarkar

Optimal Binary Classification Beyond Accuracy

The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in…

Statistics Theory · Mathematics 2022-09-27 Shashank Singh , Justin Khim

Imbalanced multi-label classification using multi-task learning with extractive summarization

Extractive summarization and imbalanced multi-label classification often require vast amounts of training data to avoid overfitting. In situations where training data is expensive to generate, leveraging information between tasks is an…

Computation and Language · Computer Science 2019-03-19 John Brandt

Probabilistic Consistency in Machine Learning and Its Connection to Uncertainty Quantification

Machine learning (ML) is often viewed as a powerful data analysis tool that is easy to learn because of its black-box nature. Yet this very nature also makes it difficult to quantify confidence in predictions extracted from ML models, and…

Machine Learning · Computer Science 2025-09-30 Paul Patrone , Anthony Kearsley

M2m: Imbalanced Classification via Major-to-minor Translation

In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Jaehyung Kim , Jongheon Jeong , Jinwoo Shin

Revisiting Explicit Regularization in Neural Networks for Well-Calibrated Predictive Uncertainty

From the statistical learning perspective, complexity control via explicit regularization is a necessity for improving the generalization of over-parameterized models. However, the impressive generalization performance of neural networks…

Machine Learning · Computer Science 2021-02-09 Taejong Joo , Uijung Chung

A Generalized Unbiased Risk Estimator for Learning with Augmented Classes

In contrast to the standard learning paradigm where all classes can be observed in training data, learning with augmented classes (LAC) tackles the problem where augmented classes unobserved in the training data may emerge in the test…

Machine Learning · Computer Science 2023-06-13 Senlin Shu , Shuo He , Haobo Wang , Hongxin Wei , Tao Xiang , Lei Feng

Review of Methods for Handling Class-Imbalanced in Classification Problems

Learning classifiers using skewed or imbalanced datasets can occasionally lead to classification issues; this is a serious issue. In some cases, one class contains the majority of examples while the other, which is frequently the more…

Machine Learning · Computer Science 2022-11-11 Satyendra Singh Rawat , Amit Kumar Mishra

Unbiased Prevalence Estimation with Multicalibrated LLMs

Estimating the prevalence of a category in a population using imperfect measurement devices (diagnostic tests, classifiers, or large language models) is fundamental to science, public health, and online trust and safety. Standard approaches…

Artificial Intelligence · Computer Science 2026-04-24 Fridolin Linder , Thomas Leeper , Daniel Haimovich , Niek Tax , Lorenzo Perini , Milan Vojnovic

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing…

Machine Learning · Computer Science 2022-11-13 Bronislav Yasinnik , Moshe Salhov , Ofir Lindenbaum , Amir Averbuch

Efficient Set-Valued Prediction in Multi-Class Classification

In cases of uncertainty, a multi-class classifier preferably returns a set of candidate classes instead of predicting a single class label with little guarantee. More precisely, the classifier should strive for an optimal balance between…

Machine Learning · Computer Science 2020-05-28 Thomas Mortier , Marek Wydmuch , Krzysztof Dembczyński , Eyke Hüllermeier , Willem Waegeman

Adaptive Ensemble of Classifiers with Regularization for Imbalanced Data Classification

The dynamic ensemble selection of classifiers is an effective approach for processing label-imbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence…

Machine Learning · Computer Science 2020-11-09 Chen Wang , Chengyuan Deng , Zhoulu Yu , Dafeng Hui , Xiaofeng Gong , Ruisen Luo

A weighted-likelihood framework for class imbalance in Bayesian prediction models

Class imbalance is a pervasive problem in predictive toxicology, where the number of non-toxic compounds often exceeds the number of toxic ones. Models trained on such data often perform well on the majority class but poorly on the minority…

Applications · Statistics 2025-10-10 Stanley E. Lazic

UniMLR: Modeling Implicit Class Significance for Multi-Label Ranking

Existing multi-label ranking (MLR) frameworks only exploit information deduced from the bipartition of labels into positive and negative sets. Therefore, they do not benefit from ranking among positive labels, which is the novel MLR…

Machine Learning · Computer Science 2025-09-12 V. Bugra Yesilkaynak , Emine Dari , Alican Mertan , Gozde Unal

Exploring Imbalanced Annotations for Effective In-Context Learning

Large language models (LLMs) have shown impressive performance on downstream tasks through in-context learning (ICL), which heavily relies on the demonstrations selected from annotated datasets. However, these datasets often exhibit…

Computation and Language · Computer Science 2025-06-02 Hongfu Gao , Feipeng Zhang , Hao Zeng , Deyu Meng , Bingyi Jing , Hongxin Wei

Towards Competitive Classifiers for Unbalanced Classification Problems: A Study on the Performance Scores

Although a great methodological effort has been invested in proposing competitive solutions to the class-imbalance problem, little effort has been made in pursuing a theoretical understanding of this matter. In order to shed some light on…

Machine Learning · Statistics 2016-09-04 Jonathan Ortigosa-Hernández , Iñaki Inza , Jose A. Lozano

Ultra-imbalanced classification guided by statistical information

Imbalanced data are frequently encountered in real-world classification tasks. Previous works on imbalanced learning mostly focused on learning with a minority class of few samples. However, the notion of imbalance also applies to cases…

Machine Learning · Computer Science 2024-09-09 Yin Jin , Ningtao Wang , Ruofan Wu , Pengfei Shi , Xing Fu , Weiqiang Wang

VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning

Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful.…

Machine Learning · Computer Science 2020-12-04 Jongwon Choi , Kwang Moo Yi , Jihoon Kim , Jinho Choo , Byoungjip Kim , Jin-Yeop Chang , Youngjune Gwon , Hyung Jin Chang