Related papers: Kernel-Based Tests for Likelihood-Free Hypothesis …
Consider the problem of binary hypothesis testing. Given $Z$ coming from either $\mathbb P^{\otimes m}$ or $\mathbb Q^{\otimes m}$, to decide between the two with small probability of error it is sufficient, and in many cases necessary, to…
The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where…
We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability…
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…
Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…
Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing,…
We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize…
Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current…
Emerging applications of sensor networks for detection sometimes suggest that classical problems ought be revisited under new assumptions. This is the case of binary hypothesis testing with independent - but not necessarily identically…
The task of the binary classification problem is to determine which of two distributions has generated a length-$n$ test sequence. The two distributions are unknown; two training sequences of length $N$, one from each distribution, are…
In this paper we introduce a kernel-based measure for detecting differences between two conditional distributions. Using the `kernel trick' and nearest-neighbor graphs, we propose a consistent estimate of this measure which can be computed…
Label Proportion Learning (LLP) addresses the classification problem where multiple instances are grouped into bags and each bag contains information about the proportion of each class. However, in practical applications, obtaining precise…
We propose a nonparametric two-sample test procedure based on Maximum Mean Discrepancy (MMD) for testing the hypothesis that two samples of functions have the same underlying distribution, using kernels defined on function spaces. This…
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this…
In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some…
Sequential hypothesis testing is a desirable decision making strategy in any time sensitive scenario. Compared with fixed sample-size testing, sequential testing is capable of achieving identical probability of error requirements using less…
We study the open-set label shift problem, where the test data may include a novel class absent from training. This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable…
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of…
Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare…