Related papers: Instance-Based Classification through Hypothesis T…

A method for classification of data with uncertainty using hypothesis testing

Binary classification is a task that involves the classification of data into one of two distinct classes. It is widely utilized in various fields. However, conventional classifiers tend to make overconfident predictions for data that…

Machine Learning · Computer Science 2025-03-13 Shoma Yokura , Akihisa Ichiki

Two-Sample Test Based on Classification Probability

Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…

Statistics Theory · Mathematics 2019-09-18 Haiyan Cai , Bryan Goggin , Qingtang Jiang

General Framework for Binary Classification on Top Samples

Many binary classification problems minimize misclassification above (or below) a threshold. We show that instances of ranking problems, accuracy at the top or hypothesis testing may be written in this form. We propose a general framework…

Machine Learning · Computer Science 2020-02-26 Lukáš Adam , Václav Mácha , Václav Šmídl , Tomáš Pevný

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

Deep models trained with noisy labels are prone to over-fitting and struggle in generalization. Most existing solutions are based on an ideal assumption that the label noise is class-conditional, i.e., instances of the same class share the…

Computer Vision and Pattern Recognition · Computer Science 2022-08-01 Ganlong Zhao , Guanbin Li , Yipeng Qin , Feng Liu , Yizhou Yu

Sequential Classification with Empirically Observed Statistics

Motivated by real-world machine learning applications, we consider a statistical classification task in a sequential setting where test samples arrive sequentially. In addition, the generating distributions are unknown and only a set of…

Machine Learning · Statistics 2021-02-11 Mahdi Haghifam , Vincent Y. F. Tan , Ashish Khisti

Active Sequential Two-Sample Testing

A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample…

Machine Learning · Computer Science 2024-07-01 Weizhi Li , Prad Kadambi , Pouria Saidi , Karthikeyan Natesan Ramamurthy , Gautam Dasarathy , Visar Berisha

Kernel Two-Sample Hypothesis Testing Using Kernel Set Classification

The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…

Machine Learning · Statistics 2017-11-15 Hamed Masnadi-Shirazi

A probabilistic methodology for multilabel classification

Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen…

Artificial Intelligence · Computer Science 2013-03-01 Alfonso E. Romero , Luis M. de Campos

Two-cluster test

Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually…

Machine Learning · Computer Science 2025-07-15 Xinying Liu , Lianyu Hu , Mudi Jiang , Simeng Zhang , Jun Lou , Zengyou He

Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

When the competing classes in a classification problem are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, we develop…

Methodology · Statistics 2023-11-02 Anvit Garg , Anil K. Ghosh , Soham Sarkar

Revisiting Classifier Two-Sample Tests

The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…

Machine Learning · Statistics 2018-03-14 David Lopez-Paz , Maxime Oquab

How to Control the Error Rates of Binary Classifiers

The traditional binary classification framework constructs classifiers which may have good accuracy, but whose false positive and false negative error rates are not under users' control. In many cases, one of the errors is more severe and…

Machine Learning · Statistics 2020-10-22 Miloš Simić

Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines

Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to…

Applications · Statistics 2021-12-01 Jingyi Jessica Li , Xin Tong

Advanced Tutorial: Label-Efficient Two-Sample Tests

Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical…

Machine Learning · Computer Science 2025-01-08 Weizhi Li , Visar Berisha , Gautam Dasarathy

Second-Order Asymptotically Optimal Statistical Classification

Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according…

Information Theory · Computer Science 2018-12-07 Lin Zhou , Vincent Y. F. Tan , Mehul Motani

Meta-Instance Selection. Instance Selection as a Classification Problem with Meta-Features

Data pruning, or instance selection, is an important problem in machine learning especially in terms of nearest neighbour classifier. However, in data pruning which speeds up the prediction phase, there is an issue related to the speed and…

Machine Learning · Computer Science 2025-01-22 Marcin Blachnik , Piotr Ciepliński

Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning

Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element belongs to a particular class. In this paper, a new algorithm for binary classification is proposed using a…

Machine Learning · Computer Science 2019-03-12 Alexandre Quemy

Classification and Bayesian Optimization for Likelihood-Free Inference

Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by…

Computation · Statistics 2015-02-20 Michael U. Gutmann , Jukka Corander , Ritabrata Dutta , Samuel Kaski

Learning about individuals from group statistics

We propose a new problem formulation which is similar to, but more informative than, the binary multiple-instance learning problem. In this setting, we are given groups of instances (described by feature vectors) along with estimates of the…

Machine Learning · Computer Science 2012-07-09 Hendrik Kuck , Nando de Freitas

Statistical comparison of classifiers through Bayesian hierarchical modelling

Usually one compares the accuracy of two competing classifiers via null hypothesis significance tests (nhst). Yet the nhst tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We…

Machine Learning · Computer Science 2016-11-23 Giorgio Corani , Alessio Benavoli , Janez Demšar , Francesca Mangili , Marco Zaffalon