Related papers: A Kernel Method for the Two-Sample Problem
We propose a novel test procedure for comparing mean functions across two groups within the reproducing kernel Hilbert space (RKHS) framework. Our proposed method is adept at handling sparsely and irregularly sampled functional data when…
We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses…
In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed…
In this paper we deal with the problem of testing for the equality of $k$ probability distributions defined on $(\mathcal{X},\mathcal{B})$, where $\mathcal{X}$ is a metric space and $\mathcal{B}$ is the corresponding Borel $\sigma$-field.…
Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do…
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive…
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…
Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of…
Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the $L^2$ distance between kernel-based distribution…
The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed…
We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This…
We use a suitable version of the so-called "kernel trick" to devise two-sample (homogeneity) tests, especially focussed on high-dimensional and functional data. Our proposal entails a simplification related to the important practical…
The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…
We propose a novel kernel-based two-sample test that leverages the spectral decomposition of the maximum mean discrepancy (MMD) statistic to identify and utilize well-estimated directional components in reproducing kernel Hilbert space…
We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between…
In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for…
Reproducing Kernel Hilbert Space (RKHS) embedding of probability distributions has proved to be an effective approach, via MMD (maximum mean discrepancy), for nonparametric hypothesis testing problems involving distributions defined over…
We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing…
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…
We develop a kernel projected Wasserstein distance for the two-sample test, an essential building block in statistics and machine learning: given two sets of samples, to determine whether they are from the same distribution. This method…