Related papers: Two-sample Testing Using Deep Learning
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…
The recent success of generative adversarial networks and variational learning suggests training a classifier network may work well in addressing the classical two-sample problem. Network-based tests have the computational advantage that…
Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical…
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…
Two-sample tests are important in statistics and machine learning, both as tools for scientific discovery as well as to detect distribution shifts. This led to the development of many sophisticated test procedures going beyond the standard…
Deep neural two-sample tests have recently shown strong power for detecting distributional differences between groups, yet their black-box nature limits interpretability and practical adoption in biomedical analysis. Moreover, most existing…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…
In this paper, we address the problem of two-sample testing in the presence of missing data under a variety of missingness mechanisms. Our focus is on the well-known energy distance-based two-sample test. In addition to the standard…
The focus of this study is to evaluate the effectiveness of Machine Learning (ML) methods for two-sample testing with right-censored observations. To achieve this, we develop several ML-based methods with varying architectures and implement…
The two-sample problem, which consists in testing whether independent samples on $\mathbb{R}^d$ are drawn from the same (unknown) distribution, finds applications in many areas. Its study in high-dimension is the subject of much attention,…
We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses…
We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However,…
The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification…
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated…
Learning effective data representations has been crucial in non-parametric two-sample testing. Common approaches will first split data into training and test sets and then learn data representations purely on the training set. However,…
We construct and analyze a neural network two-sample test to determine whether two datasets came from the same distribution (null hypothesis) or not (alternative hypothesis). We perform time-analysis on a neural tangent kernel (NTK)…
Modern kernel-based two-sample tests have shown great success in distinguishing complex, high-dimensional distributions with appropriate learned kernels. Previous work has demonstrated that this kernel learning procedure succeeds, assuming…
Machine learning and deep learning have been used extensively to classify physical surfaces through images and time-series contact data. However, these methods rely on human expertise and entail the time-consuming processes of data and…
Data depth has been applied as a nonparametric measurement for ranking multivariate samples. In this paper, we focus on homogeneity tests to assess whether two multivariate samples are from the same distribution. There are many data…