Related papers: Kernel Mean Embedding Based Hypothesis Tests for C…
Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of…
Recent advances in self-supervised learning and neural network scaling have enabled the creation of large models, known as foundation models, which can be easily adapted to a wide range of downstream tasks. The current paradigm for…
This study intends to introduce kernel mean embedding of probability measures over infinite-dimensional separable Hilbert spaces induced by functional response statistical models. The embedded function represents the concentration of…
Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used…
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…
Kernel means are frequently used to represent probability distributions in machine learning problems. In particular, the well known kernel density estimator and the kernel mean embedding both have the form of a kernel mean. Unfortunately,…
A Hilbert space embedding of a distribution---in short, a kernel mean embedding---has recently emerged as a powerful tool for machine learning and inference. The basic idea behind this framework is to map distributions into a reproducing…
While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable…
Computing a consensus object from a set of given objects is a core problem in machine learning and pattern recognition. One popular approach is to formulate it as an optimization problem using the generalized median. Previous methods like…
We present an operator-free, measure-theoretic approach to the conditional mean embedding (CME) as a random variable taking values in a reproducing kernel Hilbert space. While the kernel mean embedding of unconditional distributions has…
Depth measures are powerful tools for defining level sets in emerging, non--standard, and complex random objects such as high-dimensional multivariate data, functional data, and random graphs. Despite their favorable theoretical properties,…
Kernel mean embeddings -- integrals of a kernel with respect to a probability distribution -- are essential in Bayesian quadrature, but also widely used in other computational tools for numerical integration or for statistical inference…
The kernel mean embedding of probability distributions is commonly used in machine learning as an injective mapping from distributions to functions in an infinite dimensional Hilbert space. It allows us, for example, to define a distance…
An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded independently of the dimension of the space, properties of the distribution and smoothness…
We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability…
Kernel techniques are among the most popular and flexible approaches in data science allowing to represent probability measures without loss of information under mild conditions. The resulting mapping called mean embedding gives rise to a…
Distance-based tests, also called "energy statistics", are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from "kernel mean embeddings", are leading methods for two-sample…
Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale…
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…
In this paper we suggest two statistical hypothesis tests for the regression function of binary classification based on conditional kernel mean embeddings. The regression function is a fundamental object in classification as it determines…