Related papers: General Frameworks for Conditional Two-Sample Test…
A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample…
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…
Symmetry plays a central role in the sciences, machine learning, and statistics. While statistical tests for the presence of distributional invariance with respect to groups have a long history, tests for conditional symmetry in the form of…
We introduce credal two-sample testing, a new hypothesis testing framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic…
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…
We present a general nonparametric approach for testing whether a statistical parameter defined through conditional distributions is constant across the conditioning variables. Such hypotheses arise naturally in problems such as assessing…
We consider the problem of testing the equality of conditional distributions of a response variable given a vector of covariates between two populations. Such a hypothesis testing problem can be motivated from various machine learning and…
We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data,…
We study a new framework for property testing of probability distributions, by considering distribution testing algorithms that have access to a conditional sampling oracle.* This is an oracle that takes as input a subset $S \subseteq [N]$…
Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current…
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how…
Testing the equality of two conditional distributions is crucial in various modern applications, including transfer learning and causal inference. Despite its importance, this fundamental problem has received surprisingly little attention…
In this paper, we propose a general framework for testing the conditional distribution equality in a two-sample problem, which is most relevant to covariate shift and causal discovery. Our framework is built on neural network-based…
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of…
Deep neural two-sample tests have recently shown strong power for detecting distributional differences between groups, yet their black-box nature limits interpretability and practical adoption in biomedical analysis. Moreover, most existing…
The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…
We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test…
We study the problem of designing consistent sequential two-sample tests in a nonparametric setting. Guided by the principle of testing by betting, we reframe this task into that of selecting a sequence of payoff functions that maximize the…
Two-sample inference for the difference of population means typically relies upon a Central Limit Theorem approximation. When data are drawn from a Negative Binomial distribution, previous work of Shilane et al. (2010) showed that a Normal…
A number of applications require two-sample testing on ranked preference data. For instance, in crowdsourcing, there is a long-standing question of whether pairwise comparison data provided by people is distributed similar to…