Related papers: Two-sample test based on maximum variance discrepa…

Testing for homogeneity of several functional variables via multiple maximum variance discrepancy

This paper adresses the problem of testing for the equality of $k$ probability distributions on Hilbert spaces, with $k\geqslant 2$. We introduce a generalization of the maximum variance discrepancy called multiple maximum variance…

Statistics Theory · Mathematics 2024-04-16 Armando Sosthène Kali Balogoun , Guy Martial Nkiet

$k$-Sample problem based on generalized maximum mean discrepancy

In this paper we deal with the problem of testing for the quality of $k$ probability distributions. We introduce a generalization of the maximum mean discrepancy that permits to characterize the null hypothesis. Then, an estimator of it is…

Statistics Theory · Mathematics 2018-11-26 Armando Sosthene Kali Balogoun , Guy Martial Nkiet , Carlos Ogouyandjou

Kernel Tests of Equivalence

We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null…

Machine Learning · Statistics 2026-03-17 Xing Liu , Axel Gandy

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing,…

Machine Learning · Statistics 2026-04-21 Antoine Chatalic , Marco Letizia , Nicolas Schreuder , Lorenzo Rosasco

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation

We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic…

Machine Learning · Statistics 2016-07-04 Qiang Liu , Jason D. Lee , Michael I. Jordan

New normality test in high dimension with kernel methods

A new goodness-of-fit test for normality in high-dimension (and Reproducing Kernel Hilbert Space) is proposed. It shares common ideas with the Maximum Mean Discrepancy (MMD) it outperforms both in terms of computation time and applicability…

Statistics Theory · Mathematics 2014-04-14 Jérémie Kellner , Alain Celisse

Two Sample Testing in High Dimension via Maximum Mean Discrepancy

Maximum Mean Discrepancy (MMD) has been widely used in the areas of machine learning and statistics to quantify the distance between two distributions in the $p$-dimensional Euclidean space. The asymptotic property of the sample MMD has…

Statistics Theory · Mathematics 2023-08-29 Hanjia Gao , Xiaofeng Shao

Testing distributional equality for functional random variables

In this article, we present a nonparametric method for the general two-sample problem involving functional random variables modelled as elements of a separable Hilbert space ${\cal H}$. First, we present a general recipe based on linear…

Methodology · Statistics 2024-10-08 Bilol Banerjee

Asymptotics and practical aspects of testing normality with kernel methods

This paper is concerned with testing normality in a Hilbert space based on the maximum mean discrepancy. Specifically, we discuss the behavior of the test from two standpoints: asymptotics and practical aspects. Asymptotic normality of the…

Statistics Theory · Mathematics 2019-02-12 Natsumi Makigusa , Kanta Naito

MMD Two-sample Testing in the Presence of Arbitrarily Missing Data

In many real-world applications, it is common that a proportion of the data may be missing or only partially observed. We develop a novel two-sample testing method based on the Maximum Mean Discrepancy (MMD) which accounts for missing data…

Methodology · Statistics 2024-05-27 Yijin Zeng , Niall M. Adams , Dean A. Bodenham

Goodness-of-Fit Tests for Censored and Truncated Data: Maximum Mean Discrepancy Over Regular Functionals

We develop a systematic, omnibus approach to goodness-of-fit testing for parametric distributional models when the variable of interest is only partially observed due to censoring and/or truncation. In many such designs, tests based on the…

Methodology · Statistics 2026-02-10 Juan Carlos Escanciano , Jacobo de Uña-Álvarez

High Probability Lower Bounds for the Total Variation Distance

The statistics and machine learning communities have recently seen a growing interest in classification-based approaches to two-sample testing. The outcome of a classification-based two-sample test remains a rejection decision, which is not…

Statistics Theory · Mathematics 2022-11-15 Loris Michel , Jeffrey Näf , Nicolai Meinshausen

A Witness Two-Sample Test

The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a…

Machine Learning · Computer Science 2022-02-14 Jonas M. Kübler , Wittawat Jitkrittum , Bernhard Schölkopf , Krikamol Muandet

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require…

Machine Learning · Statistics 2025-12-17 Aaron Wei , Milad Jalali , Danica J. Sutherland

Unbiased estimators for the variance of MMD estimators

The maximum mean discrepancy (MMD) is a kernel-based distance between probability distributions useful in many applications (Gretton et al. 2012), bearing a simple estimator with pleasing computational and statistical properties. Being able…

Machine Learning · Statistics 2022-11-16 Danica J. Sutherland , Namrata Deka

Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

We study two-sample variable selection: identifying variables that discriminate between the distributions of two sets of data vectors. Such variables help scientists understand the mechanisms behind dataset discrepancies. Although…

Machine Learning · Statistics 2025-11-06 Kensuke Mitsuzawa , Motonobu Kanagawa , Stefano Bortoli , Margherita Grossi , Paolo Papotti

Signature Maximum Mean Discrepancy Two-Sample Statistical Tests

Maximum Mean Discrepancy (MMD) is a widely used concept in machine learning research which has gained popularity in recent years as a highly effective tool for comparing (finite-dimensional) distributions. Since it is designed as a…

Machine Learning · Statistics 2025-06-03 Andrew Alden , Blanka Horvath , Zacharia Issa

Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them

Hypothesis testing in high dimensional data is a notoriously difficult problem without direct access to competing models' likelihood functions. This paper argues that statistical divergences can be used to quantify the difference between…

Data Analysis, Statistics and Probability · Physics 2024-08-02 Jeremy J. H. Wilkinson , Christopher G. Lester

Maximum-of-Differences Test for Comparing Multivariate K-Sample Distributions

Comparing $K$-sample distributions is a fundamental problem in data science that arises in a wide variety of fields and applications. In this article, we introduce a maximum-of-differences approach to make such comparisons. Specifically, we…

Methodology · Statistics 2026-04-13 Wei Lan , Long Feng , Runze Li , Chih-Ling Tsai

Distribution-free two-sample testing with blurred total variation distance

Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular,…

Machine Learning · Statistics 2026-04-13 Rohan Hore , Rina Foygel Barber