Related papers: NesPrInDT: Nested undersampling in PrInDT

Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles

In this paper, we extend our PrInDT method (Weihs & Buschfeld 2021a) towards undersampling with different percentages of the smaller and the larger classes (psmall and plarge), stratification of predictors, varying the prediction threshold,…

Applications · Statistics 2021-08-12 Claus Weihs , Sarah Buschfeld

Influence of Resampling on Accuracy of Imbalanced Classification

In many real-world binary classification tasks (e.g. detection of certain objects from images), an available dataset is imbalanced, i.e., it has much less representatives of a one class (a minor class), than of another. Generally, accurate…

Machine Learning · Statistics 2017-07-14 Evgeny Burnaev , Pavel Erofeev , Artem Papanov

Neural Network Based Undersampling Techniques

Class imbalance problem is commonly faced while developing machine learning models for real-life issues. Due to this problem, the fitted model tends to be biased towards the majority class data, which leads to lower precision, recall, AUC,…

Machine Learning · Computer Science 2019-08-20 Md. Adnan Arefeen , Sumaiya Tabassum Nimi , M Sohel Rahman

Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…

Machine Learning · Statistics 2025-05-20 Yan Chen , Jose Blanchet , Krzysztof Dembczynski , Laura Fee Nern , Aaron Flores

Learning Classifiers for Imbalanced and Overlapping Data

This study is about inducing classifiers using data that is imbalanced, with a minority class being under-represented in relation to the majority classes. The first section of this research focuses on the main characteristics of data that…

Machine Learning · Computer Science 2022-10-25 Shivaditya Shivganesh , Nitin Narayanan N , Pranav Murali , Ajaykumar M

Clustering and Learning from Imbalanced Data

A learning classifier must outperform a trivial solution, in case of imbalanced data, this condition usually does not hold true. To overcome this problem, we propose a novel data level resampling method - Clustering Based Oversampling for…

Machine Learning · Computer Science 2018-11-13 Naman D. Singh , Abhinav Dhall

A Bilevel Optimization Framework for Imbalanced Data Classification

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…

Machine Learning · Computer Science 2025-07-11 Karen Medlin , Sven Leyffer , Krishnan Raghavan

Survey of resampling techniques for improving classification performance in unbalanced datasets

A number of classification problems need to deal with data imbalance between classes. Often it is desired to have a high recall on the minority class while maintaining a high precision on the majority class. In this paper, we review a…

Applications · Statistics 2016-08-23 Ajinkya More

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

Minority Class Oversampling for Tabular Data with Deep Generative Models

In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A…

Machine Learning · Computer Science 2020-07-21 Ramiro Camino , Christian Hammerschmidt , Radu State

Stop Oversampling for Class Imbalance Learning: A Critical Review

For the last two decades, oversampling has been employed to overcome the challenge of learning from imbalanced datasets. Many approaches to solving this challenge have been offered in the literature. Oversampling, on the other hand, is a…

Machine Learning · Computer Science 2022-06-09 Ahmad B. Hassanat , Ahmad S. Tarawneh , Ghada A. Altarawneh , Abdullah Almuhaimeed

Balanced Subsampling for Big Data with Categorical Covariates

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang

Restoring balance: principled under/oversampling of data for optimal classification

Class imbalance in real-world data poses a common bottleneck for machine learning tasks, since achieving good generalization on under-represented examples is often challenging. Mitigation strategies, such as under or oversampling the data…

Disordered Systems and Neural Networks · Physics 2025-02-03 Emanuele Loffredo , Mauro Pastore , Simona Cocco , Rémi Monasson

Less Is Better: Unweighted Data Subsampling via Influence Function

In the time of Big Data, training complex models on large-scale data sets is challenging, making it appealing to reduce data volume for saving computation resources by subsampling. Most previous works in subsampling are weighted methods…

Machine Learning · Computer Science 2021-04-14 Zifeng Wang , Hong Zhu , Zhenhua Dong , Xiuqiang He , Shao-Lun Huang

Superensemble Classifier for Improving Predictions in Imbalanced Datasets

Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towards one class, most existing classifiers tend not to perform well on minority class examples. Conventional classifiers usually aim to…

Machine Learning · Computer Science 2022-07-18 Tanujit Chakraborty , Ashis Kumar Chakraborty

Is augmentation effective to improve prediction in imbalanced text datasets?

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new…

Computation and Language · Computer Science 2023-04-21 Gabriel O. Assunção , Rafael Izbicki , Marcos O. Prates

Combining Prediction and Interpretation in Decision Trees (PrInDT) -- a Linguistic Example

In this paper, we show that conditional inference trees and ensembles are suitable methods for modeling linguistic variation. As against earlier linguistic applications, however, we claim that their suitability is strongly increased if we…

Computation and Language · Computer Science 2021-03-08 Claus Weihs , Sarah Buschfeld

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This…

Machine Learning · Statistics 2021-12-17 Rune D. Kjærsgaard , Manja G. Grønberg , Line K. H. Clemmensen

Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its…

Computation and Language · Computer Science 2025-10-07 Deshan Sumanathilaka , Nicholas Micallef , Julian Hough

Statistical Undersampling with Mutual Information and Support Points

Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work…

Machine Learning · Statistics 2024-12-20 Alex Mak , Shubham Sahoo , Shivani Pandey , Yidan Yue , Linglong Kong