English
Related papers

Related papers: Benchmark of Data Preprocessing Methods for Imbala…

200 papers

Cybersecurity has become essential worldwide and at all levels, concerning individuals, institutions, and governments. A basic principle in cybersecurity is to be always alert. Therefore, automation is imperative in processes where the…

Machine Learning · Computer Science 2025-05-08 Mateo Lopez-Ledezma , Gissel Velarde

In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation,…

Machine Learning · Computer Science 2022-01-21 Mohamed S. Kraiem , Fernando Sánchez-Hernández , María N. Moreno-García

Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by…

Machine Learning · Computer Science 2022-08-26 Asif Newaz , Shahriar Hassan , Farhan Shahriyar Haq

This study conducts a benchmarking study, comparing 23 different statistical and machine learning methods in a credit scoring application. In order to do so, the models' performance is evaluated over four different data sets in combination…

Econometrics · Economics 2019-07-31 Anna Stelzer

Data imbalance, that is the disproportion between the number of training observations coming from different classes, remains one of the most significant challenges affecting contemporary machine learning. The negative impact of data…

Machine Learning · Computer Science 2021-11-30 Michał Koziarski

In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A…

Machine Learning · Computer Science 2020-07-21 Ramiro Camino , Christian Hammerschmidt , Radu State

Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling…

Machine Learning · Computer Science 2024-10-01 Mohamed Abdelhamid , Abhyuday Desai

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a…

Cybersecurity is a major concern due to the increasing reliance on technology and interconnected systems. Malware detectors help mitigate cyber-attacks by comparing malware signatures. Machine learning can improve these detectors by…

Machine Learning · Computer Science 2024-01-08 Jayasudha M , Ayesha Shaik , Gaurav Pendharkar , Soham Kumar , Muhesh Kumar B , Sudharshanan Balaji

Data rebalancing techniques, including oversampling and undersampling, are a common approach to addressing the challenges of imbalanced data. To tackle unresolved problems related to both oversampling and undersampling, we propose a new…

Machine Learning · Computer Science 2025-07-11 Karen Medlin , Sven Leyffer , Krishnan Raghavan

This paper evaluates six strategies for mitigating imbalanced data: oversampling, undersampling, ensemble methods, specialized algorithms, class weight adjustments, and a no-mitigation approach referred to as the baseline. These strategies…

Machine Learning · Computer Science 2023-11-13 Jacques Wainer

Biomedical data are widely accepted in developing prediction models for identifying a specific tumor, drug discovery and classification of human cancers. However, previous studies usually focused on different classifiers, and overlook the…

Quantitative Methods · Quantitative Biology 2019-11-05 Shigang Liu , Jun Zhang , Yang Xiang , Wanlei Zhou , Dongxi Xiang

This paper presents the performance of a classifier built using the stackingC algorithm in nine different data sets. Each data set is generated using a sampling technique applied on the original imbalanced data set. Five new sampling…

Machine Learning · Computer Science 2016-01-20 Maureen Lyndel C. Lauron , Jaderick P. Pabico

Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to…

Machine Learning · Computer Science 2020-03-06 Felix Last , Georgios Douzas , Fernando Bacao

Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to…

Machine Learning · Computer Science 2022-12-26 Lucas B. V. de Amorim , George D. C. Cavalcanti , Rafael M. O. Cruz

In the context of cybersecurity of modern communications networks, Intrusion Detection Systems (IDS) have been continuously improved, many of them incorporating machine learning (ML) techniques to identify threats. Although there are…

Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic…

Machine Learning · Computer Science 2022-08-29 Daochen Zha , Kwei-Herng Lai , Qiaoyu Tan , Sirui Ding , Na Zou , Xia Hu

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem…

Computer Vision and Pattern Recognition · Computer Science 2018-10-16 Mateusz Buda , Atsuto Maki , Maciej A. Mazurowski

Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the…

Machine Learning · Statistics 2018-11-30 Rafael M. O. Cruz , Robert Sabourin , George D. C. Cavalcanti

Despite extensive research spanning several decades, class imbalance is still considered a profound difficulty for both machine learning and deep learning models. While data oversampling is the foremost technique to address this issue,…

Machine Learning · Computer Science 2025-02-12 Sukumar Kishanthan , Asela Hevapathige
‹ Prev 1 2 3 10 Next ›