Related papers: Instance Selection Improves Geometric Mean Accurac…

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by…

Machine Learning · Computer Science 2022-08-26 Asif Newaz , Shahriar Hassan , Farhan Shahriyar Haq

On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Statistics 2018-11-30 Rafael M. O. Cruz , Robert Sabourin , George D. C. Cavalcanti

A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

The problem of class imbalance is extensive for focusing on numerous applications in the real world. In such a situation, nearly all of the examples are labeled as one class called majority class, while far fewer examples are labeled as the…

Machine Learning · Computer Science 2020-12-23 Khan Md. Hasib , Md. Sadiq Iqbal , Faisal Muhammad Shah , Jubayer Al Mahmud , Mahmudul Hasan Popel , Md. Imran Hossain Showrov , Shakil Ahmed , Obaidur Rahman

Gamma distribution-based sampling for imbalanced data

Imbalanced class distribution is a common problem in a number of fields including medical diagnostics, fraud detection, and others. It causes bias in classification algorithms leading to poor performance on the minority class data. In this…

Machine Learning · Computer Science 2020-09-23 Firuz Kamalov , Dmitry Denisov

Comparative Analysis of Data Preprocessing Methods, Feature Selection Techniques and Machine Learning Models for Improved Classification and Regression Performance on Imbalanced Genetic Data

Rapid advancements in genome sequencing have led to the collection of vast amounts of genomics data. Researchers may be interested in using machine learning models on such data to predict the pathogenicity or clinical significance of a…

Quantitative Methods · Quantitative Biology 2024-08-15 Arshmeet Kaur , Morteza Sarmadi

A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS)

This paper looks into the problem of handling imbalanced data in a multi-label classification problem. The problem is solved by proposing two novel methods that primarily exploit the geometric relationship between the feature vectors. The…

Machine Learning · Computer Science 2020-10-13 Anima Majumder , Samrat Dutta , Swagat Kumar , Laxmidhar Behera

Resampling strategies for imbalanced regression: a survey and empirical analysis

Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification,…

Machine Learning · Computer Science 2025-07-17 Juscimara G. Avelino , George D. C. Cavalcanti , Rafael M. O. Cruz

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. Ensemble learning combines multiple models to obtain a robust model and has been prominently used with…

Machine Learning · Computer Science 2023-11-28 Azal Ahmad Khan , Omkar Chaudhari , Rohitash Chandra

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

This paper evaluates six strategies for mitigating imbalanced data: oversampling, undersampling, ensemble methods, specialized algorithms, class weight adjustments, and a no-mitigation approach referred to as the baseline. These strategies…

Machine Learning · Computer Science 2023-11-13 Jacques Wainer

Balanced Split: A new train-test data splitting strategy for imbalanced datasets

Classification data sets with skewed class proportions are called imbalanced. Class imbalance is a problem since most machine learning classification algorithms are built with an assumption of equal representation of all classes in the…

Machine Learning · Computer Science 2022-12-22 Azal Ahmad Khan

Cost-Sensitive Feature Selection by Optimizing F-Measures

Feature selection is beneficial for improving the performance of general machine learning tasks by extracting an informative subset from the high-dimensional features. Conventional feature selection methods usually ignore the class…

Computer Vision and Pattern Recognition · Computer Science 2019-04-05 Meng Liu , Chang Xu , Yong Luo , Chao Xu , Yonggang Wen , Dacheng Tao

Box Drawings for Learning with Imbalanced Data

The vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly…

Machine Learning · Statistics 2014-06-10 Siong Thye Goh , Cynthia Rudin

ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Computer Science 2018-11-30 Rafael M. O. Cruz , Mariana A. Souza , Robert Sabourin , George D. C. Cavalcanti

Class Imbalance Problem in Data Mining Review

In last few years there are major changes and evolution has been done on classification of data. As the application area of technology is increases the size of data also increases. Classification of data becomes difficult because of…

Machine Learning · Computer Science 2013-05-09 Rushi Longadge , Snehalata Dongre

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Imbalanced datasets are ubiquitous. Classification performance on imbalanced datasets is generally poor for the minority class as the classifier cannot learn decision boundaries well. However, in sensitive applications like fraud detection,…

Machine Learning · Computer Science 2019-10-25 Vishwa Karia , Wenhao Zhang , Arash Naeim , Ramin Ramezani

Self-paced Ensemble for Highly Imbalanced Massive Data Classification

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. The rising big data era has been witnessing more classification tasks with large-scale but extremely imbalance and low-quality datasets. Most of…

Machine Learning · Computer Science 2020-10-20 Zhining Liu , Wei Cao , Zhifeng Gao , Jiang Bian , Hechang Chen , Yi Chang , Tie-Yan Liu

Partial Resampling of Imbalanced Data

Imbalanced data is a frequently encountered problem in machine learning. Despite a vast amount of literature on sampling techniques for imbalanced data, there is a limited number of studies that address the issue of the optimal sampling…

Machine Learning · Computer Science 2022-07-12 Firuz Kamalov , Amir F. Atiya , Dina Elreedy

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing…

Machine Learning · Computer Science 2022-11-13 Bronislav Yasinnik , Moshe Salhov , Ofir Lindenbaum , Amir Averbuch

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

In predictive tasks, real-world datasets often present different degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head) classes have sufficient samples, the minority (the tail) classes can be…

Machine Learning · Computer Science 2021-09-14 Chongsheng Zhang , Paolo Soda , Jingjun Bi , Gaojuan Fan , George Almpanidis , Salvador Garcia