Related papers: Handling Imbalanced Data: A Case Study for Binary …

Imbalanced data preprocessing techniques utilizing local data characteristics

Data imbalance, that is the disproportion between the number of training observations coming from different classes, remains one of the most significant challenges affecting contemporary machine learning. The negative impact of data…

Machine Learning · Computer Science 2021-11-30 Michał Koziarski

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling…

Machine Learning · Computer Science 2024-10-01 Mohamed Abdelhamid , Abhyuday Desai

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers. This is primarily due to the tendency of the classifier to be biased towards the majority…

Machine Learning · Computer Science 2021-03-30 Ayush Tripathi , Rupayan Chakraborty , Sunil Kumar Kopparapu

Bridging the Gap: Simultaneous Fine Tuning for Data Re-Balancing

There are many real-world classification problems wherein the issue of data imbalance (the case when a data set contains substantially more samples for one/many classes than the rest) is unavoidable. While under-sampling the problematic…

Computer Vision and Pattern Recognition · Computer Science 2018-01-09 John McKay , Isaac Gerg , Vishal Monga

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by…

Machine Learning · Computer Science 2022-08-26 Asif Newaz , Shahriar Hassan , Farhan Shahriyar Haq

Class Imbalance Problem in Data Mining Review

In last few years there are major changes and evolution has been done on classification of data. As the application area of technology is increases the size of data also increases. Classification of data becomes difficult because of…

Machine Learning · Computer Science 2013-05-09 Rushi Longadge , Snehalata Dongre

Stop Oversampling for Class Imbalance Learning: A Critical Review

For the last two decades, oversampling has been employed to overcome the challenge of learning from imbalanced datasets. Many approaches to solving this challenge have been offered in the literature. Oversampling, on the other hand, is a…

Machine Learning · Computer Science 2022-06-09 Ahmad B. Hassanat , Ahmad S. Tarawneh , Ghada A. Altarawneh , Abdullah Almuhaimeed

A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification

Traditionally, in supervised machine learning, (a significant) part of the available data (usually 50% to 80%) is used for training and the rest for validation. In many problems, however, the data is highly imbalanced in regard to different…

Machine Learning · Computer Science 2020-04-21 Xiaowei Gu , Plamen P Angelov , Eduardo Almeida Soares

A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems

Class imbalance is a substantial challenge in classifying many real-world cases. Synthetic over-sampling methods have been effective to improve the performance of classifiers for imbalance problems. However, most synthetic over-sampling…

Machine Learning · Computer Science 2021-08-11 Hadi A. Khorshidi , Uwe Aickelin

Bias-Corrected Data Synthesis for Imbalanced Learning

Imbalanced data, where the positive samples represent only a small proportion compared to the negative samples, makes it challenging for classification problems to balance the false positive and false negative rates. A common approach to…

Machine Learning · Statistics 2026-02-17 Pengfei Lyu , Zhengchi Ma , Linjun Zhang , Anru R. Zhang

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to…

Machine Learning · Computer Science 2020-03-06 Felix Last , Georgios Douzas , Fernando Bacao

Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance

Imbalanced classification and spurious correlation are common challenges in data science and machine learning. Both issues are linked to data imbalance, with certain groups of data samples significantly underrepresented, which in turn would…

Machine Learning · Statistics 2026-02-10 Ryumei Nakada , Yichen Xu , Lexin Li , Linjun Zhang

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic…

Machine Learning · Computer Science 2022-08-29 Daochen Zha , Kwei-Herng Lai , Qiaoyu Tan , Sirui Ding , Na Zou , Xia Hu

Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks

Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a…

Machine Learning · Computer Science 2022-10-14 Javad Hassannataj Joloudari , Abdolreza Marefat , Mohammad Ali Nematollahi , Solomon Sunday Oyelere , Sadiq Hussain

A Study imbalance handling by various data sampling methods in binary classification

The purpose of this research report is to present the our learning curve and the exposure to the Machine Learning life cycle, with the use of a Kaggle binary classification data set and taking to explore various techniques from…

Machine Learning · Computer Science 2021-05-25 Mohamed Hamama

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification

Despite extensive research spanning several decades, class imbalance is still considered a profound difficulty for both machine learning and deep learning models. While data oversampling is the foremost technique to address this issue,…

Machine Learning · Computer Science 2025-02-12 Sukumar Kishanthan , Asela Hevapathige

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

The problem of class imbalance is extensive for focusing on numerous applications in the real world. In such a situation, nearly all of the examples are labeled as one class called majority class, while far fewer examples are labeled as the…

Machine Learning · Computer Science 2020-12-23 Khan Md. Hasib , Md. Sadiq Iqbal , Faisal Muhammad Shah , Jubayer Al Mahmud , Mahmudul Hasan Popel , Md. Imran Hossain Showrov , Shakil Ahmed , Obaidur Rahman

Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data

This study examines the impact of class-imbalanced data on deep learning models and proposes a technique for data balancing by generating synthetic data for the minority class. Unlike random-based oversampling, our method prioritizes…

Machine Learning · Computer Science 2024-02-26 Hung Nguyen , Morris Chang

Concentration and excess risk bounds for imbalanced classification with synthetic oversampling

Synthetic oversampling of minority examples using SMOTE and its variants is a leading strategy for addressing imbalanced classification problems. Despite the success of this approach in practice, its theoretical foundations remain…

Machine Learning · Statistics 2025-10-24 Touqeer Ahmad , Mohammadreza M. Kalan , François Portier , Gilles Stupfler