Related papers: Robustness of multiple testing procedures against …
Particularly in genomics, but also in other fields, it has become commonplace to undertake highly multiple Student's $t$-tests based on relatively small sample sizes. The literature on this topic is continually expanding, but the main…
In this article, we consider the problem of simultaneous testing of hypotheses when the individual test statistics are not necessarily independent. Specifically, we consider the problem of simultaneous testing of point null hypotheses…
Clustering is part of unsupervised analysis methods that consist in grouping samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to…
Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the…
It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis…
Cluster analysis is a fundamental research issue in statistics and machine learning. In many modern clustering methods, we need to determine whether two subsets of samples come from the same cluster. Since these subsets are usually…
There is a well-known problem in Null Hypothesis Significance Testing: many statistically significant results fail to replicate in subsequent experiments. We show that this problem arises because standard `point-form null' significance…
Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample…
We consider the problem of hypothesis testing in the situation where the first hypothesis is simple and the second one is local one-sided composite. We describe the choice of the thresholds and the power functions of different tests when…
Many multiple testing procedures make use of the p-values from the individual pairs of hypothesis tests, and are valid if the p-value statistics are independent and uniformly distributed under the null hypotheses. However, it has recently…
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p \times n$ data matrix, we investigate…
Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $\epsilon$-far from all…
It is often of interest to test a global null hypothesis using multiple, possibly dependent $p$-values by combining their strengths while controlling the type-I error. Recently, several heavy-tailed combination tests, such as the harmonic…
We study the problems of sequential nonparametric two-sample and independence testing. Sequential tests process data online and allow using observed data to decide whether to stop and reject the null hypothesis or to collect more data,…
We consider the problems of hypothesis testing on a probability measure of independent sample, on solution of ill-posed problem, on deconvolution problem and on Poisson mean measure. For all these setups necessary conditions and sufficient…
Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is…
Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take…
Statistical significance testing of differences in values of metrics like recall, precision and balanced F-score is a necessary part of empirical natural language processing. Unfortunately, we find in a set of experiments that many commonly…
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are…
The problem of joint sequential detection and isolation is considered in the context of multiple, not necessarily independent, data streams. A multiple testing framework is proposed, where each hypothesis corresponds to a different subset…