Related papers: Dealing with Class Imbalance using Thresholding
Output thresholding is the technique to search for the best threshold to be used during inference for any classifiers that can produce probability estimates on train and testing datasets. It is particularly useful in high imbalance…
Learning classifiers using skewed or imbalanced datasets can occasionally lead to classification issues; this is a serious issue. In some cases, one class contains the majority of examples while the other, which is frequently the more…
In order to allow machine learning algorithms to extract knowledge from raw data, these data must first be cleaned, transformed, and put into machine-appropriate form. These often very time-consuming phase is referred to as preprocessing.…
A common problem of the real-world data sets is the class imbalance, which can significantly affect the classification abilities of classifiers. Numerous methods have been proposed to cope with this problem; however, even state-of-the-art…
The one-class classification problem is a well-known research endeavor in pattern recognition. The problem is also known under different names, such as outlier and novelty/anomaly detection. The core of the problem consists in modeling and…
In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable…
In high-dimensional classification settings, we wish to seek a balance between high power and ensuring control over a desired loss function. In many settings, the points most likely to be misclassified are those who lie near the decision…
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification…
Despite extensive research spanning several decades, class imbalance is still considered a profound difficulty for both machine learning and deep learning models. While data oversampling is the foremost technique to address this issue,…
Many real-world classification problems are significantly class-imbalanced to detriment of the class of interest. The standard set of proper evaluation metrics is well-known but the usual assumption is that the test dataset imbalance equals…
In statistics and machine learning, the traditional meaning of the terms `outlier' and `anomaly' is a case in the dataset that behaves differently from the bulk of the data. This raises suspicion that it may belong to a different…
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations,…
Unsupervised outlier detection constitutes a crucial phase within data analysis and remains a dynamic realm of research. A good outlier detection algorithm should be computationally efficient, robust to tuning parameter selection, and…
Imbalance learning is a subfield of machine learning that focuses on learning tasks in the presence of class imbalance. Nearly all existing studies refer to class imbalance as a proportion imbalance, where the proportion of training samples…
Active learning methods increase classification quality by means of user feedback. An important subcategory is active learning for outlier detection with one-class classifiers. While various methods in this category exist, selecting one for…
The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are…
Outlier detection aims to identify unusual data instances that deviate from expected patterns. The outlier detection is particularly challenging when outliers are context dependent and when they are defined by unusual combinations of…
Outlier detection is an important problem occurring in a wide range of areas. Outliers are the outcome of fraudulent behaviour, mechanical faults, human error, or simply natural deviations. Many data mining applications perform outlier…
In this paper we introduce a new method for detecting outliers in a set of proportions. It is based on the construction of a suitable two-way contingency table and on the application of an algorithm for the detection of outlying cells in…
When neural networks are employed for high-stakes decision-making, it is desirable that they provide explanations for their prediction in order for us to understand the features that have contributed to the decision. At the same time, it is…