Related papers: Two-Sample Testing in High-Dimensional Models
In this paper, we propose a new test for testing the equality of two population covariance matrices in the ultra-high dimensional setting that the dimension is much larger than the sizes of both of the two samples. Our proposed methodology…
We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Specifically, we…
Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a…
We consider testing for two-sample means of high dimensional populations by thresholding. Two tests are investigated, which are designed for better power performance when the two population mean vectors differ only in sparsely populated…
We propose two tests for the equality of covariance matrices between two high-dimensional populations. One test is on the whole variance--covariance matrices, and the other is on off-diagonal sub-matrices, which define the covariance…
The classic likelihood ratio test for testing the equality of two covariance matrices breakdowns due to the singularity of the sample covariance matrices when the data dimension $p$ is larger than the sample size $n$. In this paper, we…
Hypothesis testing in high dimensional data is a notoriously difficult problem without direct access to competing models' likelihood functions. This paper argues that statistical divergences can be used to quantify the difference between…
We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-value Classifier Two-Sample Test (E-C2ST). Our test combines ideas from existing work on split likelihood ratio tests and…
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…
A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most…
We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of generative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing…
The energy test is a powerful binning-free, multi-dimensional and distribution-free tool that can be applied to compare a measurement to a given prediction (goodness-of-fit) or to check whether two data samples originate from the same…
Many testing problems are readily amenable to randomised tests such as those employing data splitting. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may…
In this paper we consider testing the equality of probability vectors of two independent multinomial distributions in high dimension. The classical chi-square test may have some drawbacks in this case since many of cell counts may be zero…
We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…
Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually…
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical $T^2$ test does not work for this "large $p$, small $n$" situation. The proposed test…
We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are…
High-dimensional statistical inference with general estimating equations are challenging and remain less explored. In this paper, we study two problems in the area: confidence set estimation for multiple components of the model parameters,…