Related papers: Bounding the Error From Reference Set Kernel Maxim…

Partial identification of kernel based two sample tests with mismeasured data

Nonparametric two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications. However, the majority of existing literature assumes that error-free…

Machine Learning · Statistics 2023-08-08 Ron Nafshi , Maggie Makar

A Witness Two-Sample Test

The Maximum Mean Discrepancy (MMD) has been the state-of-the-art nonparametric test for tackling the two-sample problem. Its statistic is given by the difference in expectations of the witness function, a real-valued function defined as a…

Machine Learning · Computer Science 2022-02-14 Jonas M. Kübler , Wittawat Jitkrittum , Bernhard Schölkopf , Krikamol Muandet

Kernel Discrepancy-Based Rerandomization for Controlled Experiments

This paper introduces a kernel discrepancy-based framework for rerandomization to enhance the precision of causal inference in controlled experiments. We demonstrate that the kernel discrepancy is the key part of the variance upper bound…

Methodology · Statistics 2025-11-05 Yiou Li , Lulu Kang

Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing

We study strictly proper scoring rules in the Reproducing Kernel Hilbert Space. We propose a general Kernel Scoring rule and associated Kernel Divergence. We consider conditions under which the Kernel Score is strictly proper. We then…

Machine Learning · Statistics 2017-04-25 Hamed Masnadi-Shirazi

Keep it Tighter -- A Story on Analytical Mean Embeddings

Kernel techniques are among the most popular and flexible approaches in data science allowing to represent probability measures without loss of information under mild conditions. The resulting mapping called mean embedding gives rise to a…

Machine Learning · Statistics 2024-11-27 Linda Chamakh , Zoltan Szabo

Sparse Approximation of a Kernel Mean

Kernel means are frequently used to represent probability distributions in machine learning problems. In particular, the well known kernel density estimator and the kernel mean embedding both have the form of a kernel mean. Unfortunately,…

Machine Learning · Statistics 2015-03-03 E. Cruz Cortés , C. Scott

A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics

In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for…

Machine Learning · Computer Science 2025-02-11 Yijin Ni , Xiaoming Huo

Variable Selection for Kernel Two-Sample Tests

We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to determine whether two collections of samples follow the same distribution. To address this, we propose a novel framework…

Machine Learning · Statistics 2024-12-23 Jie Wang , Santanu S. Dey , Yao Xie

Scalable Kernel-Based Distances for Statistical Inference and Integration

Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the…

Machine Learning · Statistics 2026-02-26 Masha Naslidnyk

Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests

The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain…

Machine Learning · Statistics 2026-05-11 Yijin Ni , Xiaoming Huo

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing,…

Machine Learning · Statistics 2026-04-21 Antoine Chatalic , Marco Letizia , Nicolas Schreuder , Lorenzo Rosasco

Kernel Two-Sample Testing via Directional Components Analysis

We propose a novel kernel-based two-sample test that leverages the spectral decomposition of the maximum mean discrepancy (MMD) statistic to identify and utilize well-estimated directional components in reproducing kernel Hilbert space…

Methodology · Statistics 2025-08-21 Rui Cui , Yuhao Li , Xiaojun Song

Error Bounds for Kernel-Based Linear System Identification with Unknown Hyperparameters

The kernel-based method has been successfully applied in linear system identification using stable kernel designs. From a Gaussian process perspective, it automatically provides probabilistic error bounds for the identified models from the…

Systems and Control · Electrical Eng. & Systems 2023-03-20 Mingzhou Yin , Roy S. Smith

Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the…

Statistics Theory · Mathematics 2024-10-31 Jian Yan , Xianyang Zhang

Targeted Separation and Convergence with Kernel Discrepancies

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each…

Machine Learning · Statistics 2025-03-26 Alessandro Barp , Carl-Johann Simon-Gabriel , Mark Girolami , Lester Mackey

Error Bounds for a Kernel-Based Constrained Optimal Smoothing Approximation

This paper establishes error bounds for the convergence of a piecewise linear approximation of the constrained optimal smoothing problem posed in a reproducing kernel Hilbert space (RKHS). This problem can be reformulated as a Bayesian…

Statistics Theory · Mathematics 2025-06-24 Laurence Grammont , François Bachoc , Andrés F. López-Lopera

Calibrated Reliable Regression using Maximum Mean Discrepancy

Accurate quantification of uncertainty is crucial for real-world applications of machine learning. However, modern deep neural networks still produce unreliable predictive uncertainty, often yielding over-confident predictions. In this…

Machine Learning · Computer Science 2020-10-29 Peng Cui , Wenbo Hu , Jun Zhu

Optimal kernel regression bounds under energy-bounded noise

Non-conservative uncertainty bounds are key for both assessing an estimation algorithm's accuracy and in view of downstream tasks, such as its deployment in safety-critical contexts. In this paper, we derive a tight, non-asymptotic…

Machine Learning · Computer Science 2026-01-16 Amon Lahr , Johannes Köhler , Anna Scampicchio , Melanie N. Zeilinger

Testing Hypotheses by Regularized Maximum Mean Discrepancy

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare…

Machine Learning · Computer Science 2013-05-03 Somayeh Danafar , Paola M. V. Rancoita , Tobias Glasmachers , Kevin Whittingstall , Juergen Schmidhuber

Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit

We characterize the asymptotic performance of nonparametric goodness of fit testing. The exponential decay rate of the type-II error probability is used as the asymptotic performance metric, and a test is optimal if it achieves the maximum…

Machine Learning · Statistics 2019-03-19 Shengyu Zhu , Biao Chen , Pengfei Yang , Zhitang Chen