Related papers: Subsampling Methods for genomic inference

A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

A model robust sub-sampling approach for Generalised Linear Models in Big data settings

In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as…

Methodology · Statistics 2022-09-07 Amalan Mahendran , Helen Thompson , James M. McGree

Predictive Subsampling for Scalable Inference in Networks

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…

Methodology · Statistics 2021-05-05 Tao Li , Cheng Meng

Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge…

Computation · Statistics 2023-08-29 Yang Shi , Huining Kang , Ji-Hyun Lee , Hui Jiang

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Subsampling for General Statistics under Long Range Dependence with application to change point analysis

In the statistical inference for long range dependent time series the shape of the limit distribution typically depends on unknown parameters. Therefore, we propose to use subsampling. We show the validity of subsampling for general…

Statistics Theory · Mathematics 2016-10-20 Annika Betken , Martin Wendler

Selective Randomization Inference for Adaptive Experiments

Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are…

Methodology · Statistics 2026-05-26 Tobias Freidling , Qingyuan Zhao , Zijun Gao

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn

Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values…

Applications · Statistics 2016-03-21 Belinda Phipson , Gordon K. Smyth

Fast Approximation of Small p-values in Permutation Tests by Partitioning the Permutations

Researchers in genetics and other life sciences commonly use permutation tests to evaluate differences between groups. Permutation tests have desirable properties, including exactness if data are exchangeable, and are applicable even when…

Computation · Statistics 2018-11-01 Brian Segal , Thomas Braun , Michael Elliott , Hui Jiang

Poisson Regression in one Covariate on Massive Data

The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally $ D $-optimal subsampling designs under a Poisson regression model with a…

Statistics Theory · Mathematics 2024-03-28 Torsten Reuter , Rainer Schwabe

An extensive simulation study evaluating the interaction of resampling techniques across multiple causal discovery contexts

Despite the accelerating presence of exploratory causal analysis in modern science and medicine, the available non-experimental methods for validating causal models are not well characterized. One of the most popular methods is to evaluate…

Methodology · Statistics 2025-03-20 Ritwick Banerjee , Bryan Andrews , Erich Kummerfeld

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Multiscale quantile segmentation

We introduce a new methodology for analyzing serial data by quantile regression assuming that the underlying quantile function consists of constant segments. The procedure does not rely on any distributional assumption besides serial…

Methodology · Statistics 2020-09-09 Laura Jula Vanegas , Merle Behr , Axel Munk

Adaptive Importance Sampling for Estimation in Structured Domains

Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we…

Artificial Intelligence · Computer Science 2013-01-18 Luis E. Ortiz , Leslie Pack Kaelbling

Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by…

Methodology · Statistics 2019-08-06 Felipe Campelo , Elizabeth F. Wanner

A Compound Decision Approach to Covariance Matrix Estimation

Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is sub-optimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings…

Methodology · Statistics 2022-06-06 Huiqin Xin , Sihai Dave Zhao

Tackling the subsampling problem to infer collective properties from limited data

Complex systems are fascinating because their rich macroscopic properties emerge from the interaction of many simple parts. Understanding the building principles of these emergent phenomena in nature requires assessing natural complex…

Neurons and Cognition · Quantitative Biology 2022-11-17 Anna Levina , Viola Priesemann , Johannes Zierenberg