Related papers: Advanced Tutorial: Label-Efficient Two-Sample Test…

Active Sequential Two-Sample Testing

A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample…

Machine Learning · Computer Science 2024-07-01 Weizhi Li , Prad Kadambi , Pouria Saidi , Karthikeyan Natesan Ramamurthy , Gautam Dasarathy , Visar Berisha

A label-efficient two-sample test

Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…

Machine Learning · Computer Science 2022-07-20 Weizhi Li , Gautam Dasarathy , Karthikeyan Natesan Ramamurthy , Visar Berisha

Towards Integration of Statistical Hypothesis Tests into Deep Neural Networks

We report our ongoing work about a new deep architecture working in tandem with a statistical test procedure for jointly training texts and their label descriptions for multi-label and multi-class classification tasks. A statistical…

Computation and Language · Computer Science 2019-06-18 Ahmad Aghaebrahimian , Mark Cieliebak

Two-sample Testing Using Deep Learning

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are…

Machine Learning · Statistics 2020-03-11 Matthias Kirchler , Shahryar Khorasani , Marius Kloft , Christoph Lippert

Active Testing: Sample-Efficient Model Evaluation

We introduce a new framework for sample-efficient model evaluation that we call active testing. While approaches like active learning reduce the number of labels needed for model training, existing literature largely ignores the cost of…

Machine Learning · Statistics 2021-06-15 Jannik Kossen , Sebastian Farquhar , Yarin Gal , Tom Rainforth

Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them

Hypothesis testing in high dimensional data is a notoriously difficult problem without direct access to competing models' likelihood functions. This paper argues that statistical divergences can be used to quantify the difference between…

Data Analysis, Statistics and Probability · Physics 2024-08-02 Jeremy J. H. Wilkinson , Christopher G. Lester

Global and Local Two-Sample Tests via Regression

Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…

Methodology · Statistics 2019-11-19 Ilmun Kim , Ann B. Lee , Jing Lei

Two-cluster test

Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually…

Machine Learning · Computer Science 2025-07-15 Xinying Liu , Lianyu Hu , Mudi Jiang , Simeng Zhang , Jun Lou , Zengyou He

AutoML Two-Sample Test

Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard…

Machine Learning · Computer Science 2023-01-18 Jonas M. Kübler , Vincent Stimper , Simon Buchholz , Krikamol Muandet , Bernhard Schölkopf

Two-Sample Test Based on Classification Probability

Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…

Statistics Theory · Mathematics 2019-09-18 Haiyan Cai , Bryan Goggin , Qingtang Jiang

Significance Analysis of High-Dimensional, Low-Sample Size Partially Labeled Data

Classification and clustering are both important topics in statistical learning. A natural question herein is whether predefined classes are really different from one another, or whether clusters are really there. Specifically, we may be…

Machine Learning · Statistics 2015-09-22 Qiyi Lu , Xingye Qiao

Practical methods for graph two-sample testing

Hypothesis testing for graphs has been an important tool in applied research fields for more than two decades, and still remains a challenging problem as one often needs to draw inference from few replicates of large graphs. Recent studies…

Machine Learning · Statistics 2018-12-03 Debarghya Ghoshdastidar , Ulrike von Luxburg

Comparing Generative Models with the New Physics Learning Machine

The rise of generative models for scientific research calls for the development of new methods to evaluate their fidelity. A natural framework for addressing this problem is two-sample hypothesis testing, namely the task of determining…

Machine Learning · Statistics 2025-08-05 Samuele Grossi , Marco Letizia , Riccardo Torre

Instance-Based Classification through Hypothesis Testing

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…

Machine Learning · Computer Science 2019-04-23 Zengyou He , Chaohua Sheng , Yan Liu , Quan Zou

Large Deviation Analysis of Score-based Hypothesis Testing

Score-based statistical models play an important role in modern machine learning, statistics, and signal processing. For hypothesis testing, a score-based hypothesis test is proposed in \cite{wu2022score}. We analyze the performance of this…

Signal Processing · Electrical Eng. & Systems 2024-02-06 Enmao Diao , Taposh Banerjee , Vahid Tarokh

Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…

Methodology · Statistics 2022-10-25 Benjamin Hivert , Denis Agniel , Rodolphe Thiébaut , Boris P Hejblum

Towards Visually Explaining Statistical Tests with Applications in Biomedical Imaging

Deep neural two-sample tests have recently shown strong power for detecting distributional differences between groups, yet their black-box nature limits interpretability and practical adoption in biomedical analysis. Moreover, most existing…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Masoumeh Javanbakhat , Piotr Komorowski , Dilyara Bareeva , Wei-Chang Lai , Wojciech Samek , Christoph Lippert

Revisiting Classifier Two-Sample Tests

The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…

Machine Learning · Statistics 2018-03-14 David Lopez-Paz , Maxime Oquab

Two-Sample Testing on Ranked Preference Data and the Role of Modeling Assumptions

A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to…

Machine Learning · Statistics 2020-11-20 Charvi Rastogi , Sivaraman Balakrishnan , Nihar B. Shah , Aarti Singh

Hypothesis testing at the extremes: fast and robust association for high-throughput data

A number of biomedical problems require performing many hypothesis tests, with an attendant need to apply stringent thresholds. Often the data take the form of a series of predictor vectors, each of which must be compared with a single…

Methodology · Statistics 2014-05-13 Yi-Hui Zhou , Fred Wright