English
Related papers

Related papers: NesPrInDT: Nested undersampling in PrInDT

200 papers

In this paper, we extend our PrInDT method (Weihs & Buschfeld 2021a) towards undersampling with different percentages of the smaller and the larger classes (psmall and plarge), stratification of predictors, varying the prediction threshold,…

Applications · Statistics 2021-08-12 Claus Weihs , Sarah Buschfeld

In many real-world binary classification tasks (e.g. detection of certain objects from images), an available dataset is imbalanced, i.e., it has much less representatives of a one class (a minor class), than of another. Generally, accurate…

Machine Learning · Statistics 2017-07-14 Evgeny Burnaev , Pavel Erofeev , Artem Papanov

Class imbalance problem is commonly faced while developing machine learning models for real-life issues. Due to this problem, the fitted model tends to be biased towards the majority class data, which leads to lower precision, recall, AUC,…

Machine Learning · Computer Science 2019-08-20 Md. Adnan Arefeen , Sumaiya Tabassum Nimi , M Sohel Rahman

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…

Machine Learning · Statistics 2025-05-20 Yan Chen , Jose Blanchet , Krzysztof Dembczynski , Laura Fee Nern , Aaron Flores

This study is about inducing classifiers using data that is imbalanced, with a minority class being under-represented in relation to the majority classes. The first section of this research focuses on the main characteristics of data that…

Machine Learning · Computer Science 2022-10-25 Shivaditya Shivganesh , Nitin Narayanan N , Pranav Murali , Ajaykumar M

A learning classifier must outperform a trivial solution, in case of imbalanced data, this condition usually does not hold true. To overcome this problem, we propose a novel data level resampling method - Clustering Based Oversampling for…

Machine Learning · Computer Science 2018-11-13 Naman D. Singh , Abhinav Dhall

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…

Machine Learning · Computer Science 2025-07-11 Karen Medlin , Sven Leyffer , Krishnan Raghavan

A number of classification problems need to deal with data imbalance between classes. Often it is desired to have a high recall on the minority class while maintaining a high precision on the majority class. In this paper, we review a…

Applications · Statistics 2016-08-23 Ajinkya More

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A…

Machine Learning · Computer Science 2020-07-21 Ramiro Camino , Christian Hammerschmidt , Radu State

For the last two decades, oversampling has been employed to overcome the challenge of learning from imbalanced datasets. Many approaches to solving this challenge have been offered in the literature. Oversampling, on the other hand, is a…

Machine Learning · Computer Science 2022-06-09 Ahmad B. Hassanat , Ahmad S. Tarawneh , Ghada A. Altarawneh , Abdullah Almuhaimeed

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang

Class imbalance in real-world data poses a common bottleneck for machine learning tasks, since achieving good generalization on under-represented examples is often challenging. Mitigation strategies, such as under or oversampling the data…

Disordered Systems and Neural Networks · Physics 2025-02-03 Emanuele Loffredo , Mauro Pastore , Simona Cocco , Rémi Monasson

In the time of Big Data, training complex models on large-scale data sets is challenging, making it appealing to reduce data volume for saving computation resources by subsampling. Most previous works in subsampling are weighted methods…

Machine Learning · Computer Science 2021-04-14 Zifeng Wang , Hong Zhu , Zhenhua Dong , Xiuqiang He , Shao-Lun Huang

Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towards one class, most existing classifiers tend not to perform well on minority class examples. Conventional classifiers usually aim to…

Machine Learning · Computer Science 2022-07-18 Tanujit Chakraborty , Ashis Kumar Chakraborty

Imbalanced datasets present a significant challenge for machine learning models, often leading to biased predictions. To address this issue, data augmentation techniques are widely used in natural language processing (NLP) to generate new…

Computation and Language · Computer Science 2023-04-21 Gabriel O. Assunção , Rafael Izbicki , Marcos O. Prates

In this paper, we show that conditional inference trees and ensembles are suitable methods for modeling linguistic variation. As against earlier linguistic applications, however, we claim that their suitability is strongly increased if we…

Computation and Language · Computer Science 2021-03-08 Claus Weihs , Sarah Buschfeld

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This…

Machine Learning · Statistics 2021-12-17 Rune D. Kjærsgaard , Manja G. Grønberg , Line K. H. Clemmensen

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its…

Computation and Language · Computer Science 2025-10-07 Deshan Sumanathilaka , Nicholas Micallef , Julian Hough

Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work…

Machine Learning · Statistics 2024-12-20 Alex Mak , Shubham Sahoo , Shivani Pandey , Yidan Yue , Linglong Kong
‹ Prev 1 2 3 10 Next ›