Related papers: $F_\beta$-plot -- a visual tool for evaluating imb…

On Model Evaluation under Non-constant Class Imbalance

Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest. The standard set of proper evaluation metrics is well-known but the usual assumption is that the test dataset imbalance equals…

Machine Learning · Computer Science 2020-04-16 Jan Brabec , Tomáš Komárek , Vojtěch Franc , Lukáš Machlica

Visual-Based Analysis of Classification Measures with Applications to Imbalanced Data

With a plethora of available classification performance measures, choosing the right metric for the right task requires careful thought. To make this decision in an informed manner, one should study and compare general properties of…

Other Computer Science · Computer Science 2020-07-30 Dariusz Brzezinski , Jerzy Stefanowski , Robert Susmaga , Izabela Szczęch

Bad practices in evaluation methodology relevant to class-imbalanced problems

For research to go in the right direction, it is essential to be able to compare and quantify performance of different algorithms focused on the same problem. Choosing a suitable evaluation metric requires deep understanding of the pursued…

Machine Learning · Computer Science 2018-12-05 Jan Brabec , Lukas Machlica

A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification

Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Metrics such as Balanced Accuracy are commonly used to evaluate a…

Machine Learning · Computer Science 2023-11-20 Min Du , Nesime Tatbul , Brian Rivers , Akhilesh Kumar Gupta , Lucas Hu , Wei Wang , Ryan Marcus , Shengtian Zhou , Insup Lee , Justin Gottschlich

A Minimax Probability Machine for Non-Decomposable Performance Measures

Imbalanced classification tasks are widespread in many real-world applications. For such classification tasks, in comparison with the accuracy rate, it is usually much more appropriate to use non-decomposable performance measures such as…

Machine Learning · Computer Science 2021-03-16 Junru Luo , Hong Qiao , Bo Zhang

A surrogate loss function for optimization of $F_\beta$ score in binary classification with imbalanced data

The $F_\beta$ score is a commonly used measure of classification performance, which plays crucial roles in classification tasks with imbalanced data sets. However, the $F_\beta$ score cannot be used as a loss function by gradient-based…

Machine Learning · Computer Science 2021-04-06 Namgil Lee , Heejung Yang , Hojin Yoo

Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation…

Machine Learning · Computer Science 2026-04-30 Taylor Maxson , Roberto Corizzo , Yaning Wu , Nathalie Japkowicz , Colin Bellinger

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classification algorithms tend to perform poorly when trained on imbalanced data. Some special strategies need to be adopted, either by modifying the data distribution or by…

Machine Learning · Computer Science 2022-08-26 Asif Newaz , Shahriar Hassan , Farhan Shahriyar Haq

F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When applying logistic regression to imbalanced data, for which majority classes dominate over minority classes, all class labels are estimated as `majority class.' In this…

Methodology · Statistics 2025-08-20 Masaaki Okabe , Jun Tsuchida , Hiroshi Yadohisa

Resampling strategies for imbalanced regression: a survey and empirical analysis

Imbalanced problems can arise in different real-world situations, and to address this, certain strategies in the form of resampling or balancing algorithms are proposed. This issue has largely been studied in the context of classification,…

Machine Learning · Computer Science 2025-07-17 Juscimara G. Avelino , George D. C. Cavalcanti , Rafael M. O. Cruz

A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

The problem of class imbalance is extensive for focusing on numerous applications in the real world. In such a situation, nearly all of the examples are labeled as one class called majority class, while far fewer examples are labeled as the…

Machine Learning · Computer Science 2020-12-23 Khan Md. Hasib , Md. Sadiq Iqbal , Faisal Muhammad Shah , Jubayer Al Mahmud , Mahmudul Hasan Popel , Md. Imran Hossain Showrov , Shakil Ahmed , Obaidur Rahman

Goodness of Fit Metrics for Multi-class Predictor

The multi-class prediction had gained popularity over recent years. Thus measuring fit goodness becomes a cardinal question that researchers often have to deal with. Several metrics are commonly used for this task. However, when one has to…

Machine Learning · Computer Science 2022-08-12 Uri Itai , Natan Katz

Good Classification Measures and How to Find Them

Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure that is best in all situations? To…

Machine Learning · Computer Science 2022-01-25 Martijn Gösgens , Anton Zhiyanov , Alexey Tikhonov , Liudmila Prokhorenkova

Balanced Split: A new train-test data splitting strategy for imbalanced datasets

Classification data sets with skewed class proportions are called imbalanced. Class imbalance is a problem since most machine learning classification algorithms are built with an assumption of equal representation of all classes in the…

Machine Learning · Computer Science 2022-12-22 Azal Ahmad Khan

Measuring Class-Imbalance Sensitivity of Deterministic Performance Evaluation Metrics

The class-imbalance issue is intrinsic to many real-world machine learning tasks, particularly to the rare-event classification problems. Although the impact and treatment of imbalanced data is widely known, the magnitude of a metric's…

Machine Learning · Computer Science 2022-06-22 Azim Ahmadzadeh , Rafal A. Angryk

Review of Methods for Handling Class-Imbalanced in Classification Problems

Learning classifiers using skewed or imbalanced datasets can occasionally lead to classification issues; this is a serious issue. In some cases, one class contains the majority of examples while the other, which is frequently the more…

Machine Learning · Computer Science 2022-11-11 Satyendra Singh Rawat , Amit Kumar Mishra

A Survey of Predictive Modelling under Imbalanced Distributions

Many real world data mining applications involve obtaining predictive models using data sets with strongly imbalanced distributions of the target variable. Frequently, the least common values of this target variable are associated with…

Machine Learning · Computer Science 2015-05-14 Paula Branco , Luis Torgo , Rita Ribeiro

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Learning from imbalanced data is one of the most significant challenges in real-world classification tasks. In such cases, neural networks performance is substantially impaired due to preference towards the majority class. Existing…

Machine Learning · Computer Science 2022-11-13 Bronislav Yasinnik , Moshe Salhov , Ofir Lindenbaum , Amir Averbuch

Constrained Classification and Ranking via Quantiles

In most machine learning applications, classification accuracy is not the primary metric of interest. Binary classifiers which face class imbalance are often evaluated by the $F_\beta$ score, area under the precision-recall curve, Precision…

Machine Learning · Computer Science 2018-03-02 Alan Mackey , Xiyang Luo , Elad Eban

Reformulating van Rijsbergen's $F_{\beta}$ metric for weighted binary cross-entropy

The separation of performance metrics from gradient based loss functions may not always give optimal results and may miss vital aggregate information. This paper investigates incorporating a performance metric alongside differentiable loss…

Machine Learning · Statistics 2025-07-08 Satesh Ramdhani