Related papers: Classification accuracy as a proxy for two sample …

Statistical Inference in Classification of High-dimensional Gaussian Mixture

We consider the classification problem of a high-dimensional mixture of two Gaussians with general covariance matrices. Using the replica method from statistical physics, we investigate the asymptotic behavior of a general class of…

Machine Learning · Statistics 2024-10-29 Hanwen Huang , Peng Zeng

Private High-Dimensional Hypothesis Testing

We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, for $d$-dimensional Gaussian distributions with known covariance $\Sigma$, we can test whether the distribution…

Data Structures and Algorithms · Computer Science 2022-07-26 Shyam Narayanan

Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$

We revisit the problem of estimating the mean of a real-valued distribution, presenting a novel estimator with sub-Gaussian convergence: intuitively, "our estimator, on any distribution, is as accurate as the sample mean is for the Gaussian…

Statistics Theory · Mathematics 2020-11-18 Jasper C. H. Lee , Paul Valiant

Exact and efficient multivariate two-sample tests through generalized linear rank statistics

So-called linear rank statistics provide a means for distribution-free (even in finite samples), yet highly flexible, two-sample testing in the setting of univariate random variables. Their flexibility derives from a choice of weights that…

Methodology · Statistics 2023-10-03 Dan D. Erdmann-Pham

Variable Selection Consistency of Gaussian Process Regression

Bayesian nonparametric regression under a rescaled Gaussian process prior offers smoothness-adaptive function estimation with near minimax-optimal error rates. Hierarchical extensions of this approach, equipped with stochastic variable…

Statistics Theory · Mathematics 2020-12-15 Sheng Jiang , Surya T. Tokdar

Classification with High-Dimensional Sparse Samples

The task of the binary classification problem is to determine which of two distributions has generated a length-$n$ test sequence. The two distributions are unknown; two training sequences of length $N$, one from each distribution, are…

Information Theory · Computer Science 2016-04-18 Dayu Huang , Sean Meyn

On an Exact and Nonparametric Test for the Separability of Two Classes by Means of a Simple Threshold

This paper introduces a statistical test inferring whether a variable allows separating two classes by means of a single critical value. Its test statistic is the prediction error of a nonparametric threshold classifier. While this approach…

Methodology · Statistics 2017-07-17 Fabian Schroeder

Global and Local Two-Sample Tests via Regression

Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature,…

Methodology · Statistics 2019-11-19 Ilmun Kim , Ann B. Lee , Jing Lei

Two-sample testing in non-sparse high-dimensional linear models

In analyzing high-dimensional models, sparsity of the model parameter is a common but often undesirable assumption. In this paper, we study the following two-sample testing problem: given two samples generated by two high-dimensional linear…

Statistics Theory · Mathematics 2017-08-16 Yinchu Zhu , Jelena Bradic

Revisiting Classifier Two-Sample Tests

The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary…

Machine Learning · Statistics 2018-03-14 David Lopez-Paz , Maxime Oquab

Testing distributional equality for functional random variables

In this article, we present a nonparametric method for the general two-sample problem involving functional random variables modelled as elements of a separable Hilbert space ${\cal H}$. First, we present a general recipe based on linear…

Methodology · Statistics 2024-10-08 Bilol Banerjee

A Robust Permutation Test for Subvector Inference in Linear Regressions

We develop a new permutation test for inference on a subvector of coefficients in linear models. The test is exact when the regressors and the error terms are independent. Then, we show that the test is asymptotically of correct level,…

Econometrics · Economics 2023-09-13 Xavier D'Haultfœuille , Purevdorj Tuvaandorj

Sparse linear discriminant analysis by thresholding for high dimensional data

In many social, economical, biological and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is…

Statistics Theory · Mathematics 2011-05-19 Jun Shao , Yazhen Wang , Xinwei Deng , Sijian Wang

Statistical Inference for Data-adaptive Doubly Robust Estimators with Survival Outcomes

The consistency of doubly robust estimators relies on consistent estimation of at least one of two nuisance regression parameters. In moderate to large dimensions, the use of flexible data-adaptive regression estimators may aid in achieving…

Machine Learning · Statistics 2019-01-30 Iván Díaz

An Impossibility Result for High Dimensional Supervised Learning

We study high-dimensional asymptotic performance limits of binary supervised classification problems where the class conditional densities are Gaussian with unknown means and covariances and the number of signal dimensions scales faster…

Machine Learning · Statistics 2016-11-17 Mohammad Hossein Rohban , Prakash Ishwar , Birant Orten , William C. Karl , Venkatesh Saligrama

Validation of Approximate Likelihood and Emulator Models for Computationally Intensive Simulations

Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an…

Methodology · Statistics 2020-06-18 Niccolò Dalmasso , Ann B. Lee , Rafael Izbicki , Taylor Pospisil , Ilmun Kim , Chieh-An Lin

Better-Than-Chance Classification for Signal Detection

The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal…

Methodology · Statistics 2020-01-28 Jonathan D. Rosenblatt , Yuval Benjamini , Roee Gilron , Roy Mukamel , Jelle J. Goeman

Two-Sample Test Based on Classification Probability

Robust classification algorithms have been developed in recent years with great success. We take advantage of this development and recast the classical two-sample test problem in the framework of classification. Based on the estimates of…

Statistics Theory · Mathematics 2019-09-18 Haiyan Cai , Bryan Goggin , Qingtang Jiang

High-dimensional Statistical Inference and Variable Selection Using Sufficient Dimension Association

Simultaneous variable selection and statistical inference is challenging in high-dimensional data analysis. Most existing post-selection inference methods require explicitly specified regression models, which are often linear, as well as…

Methodology · Statistics 2026-03-19 Shangyuan Ye , Shauna Rakshe , Ye Liang

Testing for no effect in regression problems: a permutation approach

Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure…

Methodology · Statistics 2024-04-30 Michał Ciszewski , Jakob Söhl , Ton Leenen , Bart van Trigt , Geurt Jongbloed