Related papers: Instance-Based Classification through Hypothesis T…
Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…
Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework…
Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the…
Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of…
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample…
The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…
Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen…
Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually…
When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…
The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…
The traditional binary classification framework constructs classifiers which may have good accuracy, but whose false positive and false negative error rates are not under users' control. In many cases, one of the errors is more severe and…
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to…
Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical…
Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according…
Data pruning, or instance selection, is an important problem in machine learning especially in terms of nearest neighbour classifier. However, in data pruning which speeds up the prediction phase, there is an issue related to the speed and…
Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…
Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by…
We propose a new problem formulation which is similar to, but more informative than, the binary multiple-instance learning problem. In this setting, we are given groups of instances (described by feature vectors) along with estimates of the…
Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We…