Related papers: Classification accuracy as a proxy for two sample …
We consider the classification problem of a high-dimensional mixture of two Gaussians with general covariance matrices. Using the replica method from statistical physics, we investigate the asymptotic behavior of a general class of…
We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, for $d$-dimensional Gaussian distributions with known covariance $\Sigma$, we can test whether the distribution…
We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, "our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian…
So-called linear rank statistics provide a means for distribution-free (even in finite samples), yet highly flexible, two-sample testing in the setting of univariate random variables. Their flexibility derives from a choice of weights that…
Bayesian nonparametric regression under a rescaled Gaussian process prior offers smoothness-adaptive function estimation with near minimax-optimal error rates. Hierarchical extensions of this approach, equipped with stochastic variable…
The task of the binary classification problem is to determine which of two distributions has generated a length-$n$ test sequence. The two distributions are unknown; two training sequences of length $N$, one from each distribution, are…
This paper introduces a statistical test inferring whether a variable allows separating two classes by means of a single critical value. Its test statistic is the prediction error of a nonparametric threshold classifier. While this approach…
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…
In analyzing high-dimensional models, sparsity of the model parameter is a common but often undesirable assumption. In this paper, we study the following two-sample testing problem: given two samples generated by two high-dimensional linear…
The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…
In this article, we present a nonparametric method for the general two-sample problem involving functional random variables modelled as elements of a separable Hilbert space ${\cal H}$. First, we present a general recipe based on linear…
We develop a new permutation test for inference on a subvector of coefficients in linear models. The test is exact when the regressors and the error terms are independent. Then, we show that the test is asymptotically of correct level,…
In many social, economical, biological and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is…
The consistency of doubly robust estimators relies on consistent estimation of at least one of two nuisance regression parameters. In moderate to large dimensions, the use of flexible data-adaptive regression estimators may aid in achieving…
We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster…
Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an…
The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal…
Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…
Simultaneous variable selection and statistical inference is challenging in high-dimensional data analysis. Most existing post-selection inference methods require explicitly specified regression models, which are often linear, as well as…
Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure…