Related papers: Benchmark of Data Preprocessing Methods for Imbala…

Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets

Cybersecurity has become essential worldwide and at all levels, concerning individuals, institutions, and governments. A basic principle in cybersecurity is to be always alert. Therefore, automation is imperative in processes where the…

Machine Learning · Computer Science 2025-05-08 Mateo Lopez-Ledezma , Gissel Velarde

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by…

Machine Learning · Computer Science 2022-08-26 Asif Newaz , Shahriar Hassan , Farhan Shahriyar Haq

Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions

This study conducts a benchmarking study, comparing 23 different statistical and machine learning methods in a credit scoring application. In order to do so, the models' performance is evaluated over four different data sets in combination…

Econometrics · Economics 2019-07-31 Anna Stelzer

Imbalanced data preprocessing techniques utilizing local data characteristics

Data imbalance, that is the disproportion between the number of training observations coming from different classes, remains one of the most significant challenges affecting contemporary machine learning. The negative impact of data…

Machine Learning · Computer Science 2021-11-30 Michał Koziarski

Minority Class Oversampling for Tabular Data with Deep Generative Models

In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A…

Machine Learning · Computer Science 2020-07-21 Ramiro Camino , Christian Hammerschmidt , Radu State

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling…

Machine Learning · Computer Science 2024-10-01 Mohamed Abdelhamid , Abhyuday Desai

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a…

Machine Learning · Statistics 2026-04-30 Behnam Yousefimehr , Mehdi Ghatee , Javad Fazli , Shervin Ghaffari , Zahra Rafei , Mohammad Amin Seifi , Sajed Tavakoli , Abolfazl Nikahd , Mahdi Razi Gandomani , Alireza Orouji , Ramtin Mahmoudi Kashani , Sarina Heshmati , Negin Sadat Mousavi

Comparative Analysis of Imbalanced Malware Byteplot Image Classification using Transfer Learning

Cybersecurity is a major concern due to the increasing reliance on technology and interconnected systems. Malware detectors help mitigate cyber-attacks by comparing malware signatures. Machine learning can improve these detectors by…

Machine Learning · Computer Science 2024-01-08 Jayasudha M , Ayesha Shaik , Gaurav Pendharkar , Soham Kumar , Muhesh Kumar B , Sudharshanan Balaji

A Bilevel Optimization Framework for Imbalanced Data Classification

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…

Machine Learning · Computer Science 2025-07-11 Karen Medlin , Sven Leyffer , Krishnan Raghavan

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

This paper evaluates six strategies for mitigating imbalanced data: oversampling, undersampling, ensemble methods, specialized algorithms, class weight adjustments, and a no-mitigation approach referred to as the baseline. These strategies…

Machine Learning · Computer Science 2023-11-13 Jacques Wainer

A Study of Data Pre-processing Techniques for Imbalanced Biomedical Data Classification

Biomedical data are widely accepted in developing prediction models for identifying a specific tumor, drug discovery and classification of human cancers. However, previous studies usually focused on different classifiers, and overlook the…

Quantitative Methods · Quantitative Biology 2019-11-05 Shigang Liu , Jun Zhang , Yang Xiang , Wanlei Zhou , Dongxi Xiang

Improved Sampling Techniques for Learning an Imbalanced Data Set

This paper presents the performance of a classifier built using the stackingC algorithm in nine different data sets. Each data set is generated using a sampling technique applied on the original imbalanced data set. Five new sampling…

Machine Learning · Computer Science 2016-01-20 Maureen Lyndel C. Lauron , Jaderick P. Pabico

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to…

Machine Learning · Computer Science 2020-03-06 Felix Last , Georgios Douzas , Fernando Bacao

The choice of scaling technique matters for classification performance

Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to…

Machine Learning · Computer Science 2022-12-26 Lucas B. V. de Amorim , George D. C. Cavalcanti , Rafael M. O. Cruz

Impacts of Data Preprocessing and Hyperparameter Optimization on the Performance of Machine Learning Models Applied to Intrusion Detection Systems

In the context of cybersecurity of modern communications networks, Intrusion Detection Systems (IDS) have been continuously improved, many of them incorporating machine learning (ML) techniques to identify threats. Although there are…

Cryptography and Security · Computer Science 2024-07-17 Mateus Guimarães Lima , Antony Carvalho , João Gabriel Álvares , Clayton Escouper das Chagas , Ronaldo Ribeiro Goldschmidt

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic…

Machine Learning · Computer Science 2022-08-29 Daochen Zha , Kwei-Herng Lai , Qiaoyu Tan , Sirui Ding , Na Zou , Xia Hu

A systematic study of the class imbalance problem in convolutional neural networks

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Mateusz Buda , Atsuto Maki , Maciej A. Mazurowski

On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Statistics 2018-11-30 Rafael M. O. Cruz , Robert Sabourin , George D. C. Cavalcanti

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification

Despite extensive research spanning several decades, class imbalance is still considered a profound difficulty for both machine learning and deep learning models. While data oversampling is the foremost technique to address this issue,…

Machine Learning · Computer Science 2025-02-12 Sukumar Kishanthan , Asela Hevapathige