Related papers: Properties of higher criticism under strong depend…
Higher criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, higher criticism also has reasonable performance in settings where those variables…
In a bivariate setting, we consider the problem of detecting a sparse contamination or mixture component, where the effect manifests itself as a positive dependence between the variables, which are otherwise independent in the main…
The detection of weak and rare effects in large amounts of data arises in a number of modern data analysis problems. Known results show that in this situation the potential of statistical inference is severely limited by the large-scale…
Higher criticism is a large-scale testing procedure that can attain the optimal detection boundary for sparse and faint signals. However, there has been a lack of knowledge in most existing works about its asymptotic distribution for more…
In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was…
Detecting anomalies in large sets of observations is crucial in various applications, such as epidemiological studies, gene expression studies, and systems monitoring. We consider settings where the units of interest result in multiple…
In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal…
Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the…
Signal identification in large-dimensional settings is a challenging problem in biostatistics. Recently, the method of higher criticism (HC) was shown to be an effective means for determining appropriate decision thresholds. Here, we study…
Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning…
Accurately estimating the proportion of true signals among a large number of variables is crucial for enhancing the precision and reliability of scientific research. Traditional signal proportion estimators often assume independence among…
Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any…
Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge…
As contemporary software-intensive systems reach increasingly large scale, it is imperative that failure detection schemes be developed to help prevent costly system downtimes. A promising direction towards the construction of such schemes…
Independence screening methods such as the two sample $t$-test and the marginal correlation based ranking are among the most widely used techniques for variable selection in ultrahigh dimensional data sets. In this short note, simple…
We consider the scenario where important signals are not strong enough to be separable from a large amount of noise. Such weak signals commonly exist in large-scale data analysis and play vital roles in many biomedical applications.…
Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to…
In this paper, we consider the problem of testing independence in high-dimensional settings with missing data. Building upon a recently proposed Kendall-based statistic, we introduce two new modifications specifically designed to…
In this article, we consider the problem of testing the independence between two random variables. Our primary objective is to develop tests that are highly effective at detecting associations arising from explicit or implicit functional…
We consider two alternative tests to the Higher Criticism test of Donoho and Jin [Ann. Statist. 32 (2004) 962-994] for high-dimensional means under the sparsity of the nonzero means for sub-Gaussian distributed data with unknown column-wise…