Related papers: Kernel Two-Sample Hypothesis Testing Using Kernel …
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of…
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…
Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…
Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do…
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over…
Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and…
We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are…
We propose a new one-sample test for normality in a Reproducing Kernel Hilbert Space (RKHS). Namely, we test the null-hypothesis of belonging to a given family of Gaussian distributions. Hence our procedure may be applied either to test…
We study strictly proper scoring rules in the Reproducing Kernel Hilbert Space. We propose a general Kernel Scoring rule and associated Kernel Divergence. We consider conditions under which the Kernel Score is strictly proper. We then…
Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the…
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…
Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…
The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…
Anomaly detection based on one-class classification algorithms is broadly used in many applied domains like image processing (e.g. detection of whether a patient is "cancerous" or "healthy" from mammography image), network intrusion…
We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to determine whether two collections of samples follow the same distribution. To address this, we propose a novel framework…
Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean…
We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However,…
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how…
In supervised learning with distributional inputs in the two-stage sampling setup, relevant to applications like learning-based medical screening or causal learning, the inputs (which are probability distributions) are not accessible in the…