Related papers: Comparing distributions: $\ell_1$ geometry improve…

Fast Two-Sample Testing with Analytic Representations of Probability Measures

We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses…

Machine Learning · Statistics 2015-06-16 Kacper Chwialkowski , Aaditya Ramdas , Dino Sejdinovic , Arthur Gretton

A Kernel Method for the Two-Sample Problem

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over…

Machine Learning · Computer Science 2008-05-16 Arthur Gretton , Karsten Borgwardt , Malte J. Rasch , Bernhard Scholkopf , Alexander J. Smola

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample…

Machine Learning · Statistics 2024-06-27 Cencheng Shen , Joshua T. Vogelstein

Two-sample Statistics Based on Anisotropic Kernels

The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed…

Machine Learning · Statistics 2018-09-03 Xiuyuan Cheng , Alexander Cloninger , Ronald R. Coifman

A Kernel-Based Conditional Two-Sample Test Using Nearest Neighbors (with Applications to Calibration, Regression Curves, and Simulation-Based Inference)

In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed…

Methodology · Statistics 2024-08-30 Anirban Chatterjee , Ziang Niu , Bhaswar B. Bhattacharya

Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing

Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention…

Methodology · Statistics 2025-09-04 Jian Yan , Zhuoxi Li , Xianyang Zhang

Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…

Machine Learning · Statistics 2021-01-15 Feng Liu , Wenkai Xu , Jie Lu , Guangquan Zhang , Arthur Gretton , Danica J. Sutherland

Generalized Kernel Two-Sample Tests

Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do…

Methodology · Statistics 2023-11-21 Hoseung Song , Hao Chen

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current…

Statistics Theory · Mathematics 2014-11-25 Aaditya Ramdas , Sashank J. Reddi , Barnabas Poczos , Aarti Singh , Larry Wasserman

Comparing Distributions and Shapes using the Kernel Distance

Starting with a similarity function between objects, it is possible to define a distance metric on pairs of objects, and more generally on probability distributions over them. These distance metrics have a deep basis in functional analysis,…

Computational Geometry · Computer Science 2011-03-15 Sarang Joshi , Raj Varma Kommaraju , Jeff M. Phillips , Suresh Venkatasubramanian

Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common…

Statistics Theory · Mathematics 2015-08-05 Aaditya Ramdas , Sashank J. Reddi , Barnabas Poczos , Aarti Singh , Larry Wasserman

A uniform kernel trick for high-dimensional two-sample problems

We use a suitable version of the so-called "kernel trick" to devise two-sample (homogeneity) tests, especially focussed on high-dimensional and functional data. Our proposal entails a simplification related to the important practical…

Statistics Theory · Mathematics 2024-04-24 Javier Cárcamo , Antonio Cuevas , Luis-Alberto Rodríguez

Two-sample Test with Kernel Projected Wasserstein Distance

We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method…

Statistics Theory · Mathematics 2022-05-10 Jie Wang , Rui Gao , Yao Xie

On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions

This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave…

Machine Learning · Statistics 2014-11-25 Sashank J. Reddi , Aaditya Ramdas , Barnabás Póczos , Aarti Singh , Larry Wasserman

A Note on Optimizing Distributions using Kernel Mean Embeddings

Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used…

Machine Learning · Computer Science 2021-06-29 Boris Muzellec , Francis Bach , Alessandro Rudi

A fast and effective kernel two-sample test for large-scale data

Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean…

Methodology · Statistics 2025-10-03 Hoseung Song , Hao Chen

Meta Two-Sample Testing: Learning Kernels for Testing with Limited Data

Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…

Machine Learning · Statistics 2022-01-06 Feng Liu , Wenkai Xu , Jie Lu , Danica J. Sutherland

A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics

In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for…

Machine Learning · Computer Science 2025-02-11 Yijin Ni , Xiaoming Huo

Kernel Tests of Equivalence

We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null…

Machine Learning · Statistics 2026-03-17 Xing Liu , Axel Gandy

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between…

Machine Learning · Computer Science 2015-03-20 Dino Sejdinovic , Arthur Gretton , Bharath Sriperumbudur , Kenji Fukumizu