Related papers: Subsampling Methods for genomic inference
Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…
In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as…
Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…
Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge…
Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…
In the statistical inference for long range dependent time series the shape of the limit distribution typically depends on unknown parameters. Therefore, we propose to use subsampling. We show the validity of subsampling for general…
Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…
Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…
Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values…
Researchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when…
The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally $ D $-optimal subsampling designs under a Poisson regression model with a…
Despite the accelerating presence of exploratory causal analysis in modern science and medicine, the available non-experimental methods for validating causal models are not well characterized. One of the most popular methods is to evaluate…
Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…
Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…
We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial…
Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we…
This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by…
Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is sub-optimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings…
Complex systems are fascinating because their rich macroscopic properties emerge from the interaction of many simple parts. Understanding the building principles of these emergent phenomena in nature requires assessing natural complex…