Related papers: Kernel-Based Tests for Likelihood-Free Hypothesis …

Likelihood-free hypothesis testing

Consider the problem of binary hypothesis testing. Given $Z$ coming from either $\mathbb P^{\otimes m}$ or $\mathbb Q^{\otimes m}$, to decide between the two with small probability of error it is sufficient, and in many cases necessary, to…

Statistics Theory · Mathematics 2024-03-11 Patrik Róbert Gerber , Yury Polyanskiy

Kernel Two-Sample Hypothesis Testing Using Kernel Set Classification

The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…

Machine Learning · Statistics 2017-11-15 Hamed Masnadi-Shirazi

Bayesian Kernel Two-Sample Testing

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…

Methodology · Statistics 2022-01-25 Qinyi Zhang , Veit Wild , Sarah Filippi , Seth Flaxman , Dino Sejdinovic

Likelihood Ratio Tests by Kernel Gaussian Embedding

We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability…

Machine Learning · Statistics 2025-09-16 Leonardo V. Santoro , Victor M. Panaretos

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…

Machine Learning · Statistics 2021-01-15 Feng Liu , Wenkai Xu , Jie Lu , Guangquan Zhang , Arthur Gretton , Danica J. Sutherland

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…

Machine Learning · Statistics 2022-01-06 Feng Liu , Wenkai Xu , Jie Lu , Danica J. Sutherland

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing,…

Machine Learning · Statistics 2026-04-21 Antoine Chatalic , Marco Letizia , Nicolas Schreuder , Lorenzo Rosasco

Kernel Two-Sample Tests for Manifold Data

We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize…

Machine Learning · Statistics 2024-02-27 Xiuyuan Cheng , Yao Xie

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current…

Statistics Theory · Mathematics 2014-11-25 Aaditya Ramdas , Sashank J. Reddi , Barnabas Poczos , Aarti Singh , Larry Wasserman

Algorithms and Fundamental Limits for Unlabeled Detection using Types

Emerging applications of sensor networks for detection sometimes suggest that classical problems ought be revisited under new assumptions. This is the case of binary hypothesis testing with independent - but not necessarily identically…

Information Theory · Computer Science 2019-03-27 Stefano Marano , Peter Willett

Classification with High-Dimensional Sparse Samples

The task of the binary classification problem is to determine which of two distributions has generated a length-$n$ test sequence. The two distributions are unknown; two training sequences of length $N$, one from each distribution, are…

Information Theory · Computer Science 2016-04-18 Dayu Huang , Sean Meyn

A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference)

In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed…

Methodology · Statistics 2024-08-30 Anirban Chatterjee , Ziang Niu , Bhaswar B. Bhattacharya

Learning from M-Tuple Dominant Positive and Unlabeled Data

Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise…

Machine Learning · Computer Science 2025-07-15 Jiahe Qin , Junpeng Li , Changchun Hua , Yana Yang

A Kernel Two-Sample Test for Functional Data

We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This…

Statistics Theory · Mathematics 2020-10-20 George Wynne , Andrew B. Duncan

M2m: Imbalanced Classification via Major-to-minor Translation

In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this…

Computer Vision and Pattern Recognition · Computer Science 2020-12-22 Jaehyung Kim , Jongheon Jeong , Jinwoo Shin

Likelihood-based semi-supervised model selection with applications to speech processing

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some…

Machine Learning · Statistics 2011-08-25 Christopher M. White , Sanjeev P. Khudanpur , Patrick J. Wolfe

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Sequential hypothesis testing is a desirable decision making strategy in any time sensitive scenario. Compared with fixed sample-size testing, sequential testing is capable of achieving identical probability of error requirements using less…

Machine Learning · Statistics 2017-11-17 Diyan Teng , Emre Ertin

Semiparametric Learning from Open-Set Label Shift Data

We study the open-set label shift problem, where the test data may include a novel class absent from training. This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable…

Methodology · Statistics 2025-09-19 Siyan Liu , Yukun Liu , Qinglong Tian , Pengfei Li , Jing Qin

Kernel Hypothesis Testing with Set-valued Data

We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of…

Methodology · Statistics 2021-02-03 Alexis Bellot , Mihaela van der Schaar

Testing Hypotheses by Regularized Maximum Mean Discrepancy

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare…

Machine Learning · Computer Science 2013-05-03 Somayeh Danafar , Paola M. V. Rancoita , Tobias Glasmachers , Kevin Whittingstall , Juergen Schmidhuber