Related papers: A Scalable Bootstrap for Massive Data

The Big Data Bootstrap

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we…

Machine Learning · Computer Science 2012-07-03 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael Jordan

A subsampled double bootstrap for massive data

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its…

Methodology · Statistics 2015-08-06 Srijan Sengupta , Stanislav Volgushev , Xiaofeng Shao

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling methods like BLB (Bag of Little Bootstraps) serves as powerful tools for assessing the quality of estimators for massive data. However, the performance of the subsampling…

Methodology · Statistics 2022-01-14 Yingying Ma , Hansheng Wang

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

Variable Selection with Scalable Bootstrap in Generalized Linear Model for Massive Data

Bootstrap is commonly used as a tool for non-parametric statistical inference to estimate meaningful parameters in Variable Selection Models. However, for massive dataset that has exponential growth rate, the computation of Bootstrap…

Computation · Statistics 2016-12-26 Zhibing He , Yichen Qin , Ben-Chang Shia , Yang Li

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen

Bootstrap in High Dimension with Low Computation

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We…

Methodology · Statistics 2023-06-21 Henry Lam , Zhenyuan Liu

Cheap Subsampling bootstrap confidence intervals for fast and robust inference

Bootstrapping is often applied to get confidence limits for semiparametric inference of a target parameter in the presence of nuisance parameters. Bootstrapping with replacement can be computationally expensive and problematic when…

Methodology · Statistics 2025-03-06 Johan Sebastian Ohlendorff , Anders Munch , Kathrine Kold Sørensen , Thomas Alexander Gerds

Scalable Efficient Inference in Complex Surveys through Targeted Resampling of Weights

Survey data often arises from complex sampling designs, such as stratified or multistage sampling, with unequal inclusion probabilities. When sampling is informative, traditional inference methods yield biased estimators and poor coverage.…

Methodology · Statistics 2025-04-17 Snigdha Das , Dipankar Bandyopadhyay , Debdeep Pati

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As…

Methodology · Statistics 2026-03-17 Matthew Kosko , Falco J , Bargagli-Stoffi , Lin Wang , Michele Santacatterina

Simultaneous Inference for Massive Data: Distributed Bootstrap

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling,…

Machine Learning · Statistics 2020-02-21 Yang Yu , Shih-Kang Chao , Guang Cheng

A Cheap Bootstrap Method for Fast Inference

The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…

Methodology · Statistics 2022-02-02 Henry Lam

Debiasing the Debiased Lasso with Bootstrap

We consider statistical inference for a single coordinate of regression coefficients in high-dimensional linear models. Recently, the debiased estimators are popularly used for constructing confidence intervals and hypothesis testing in…

Statistics Theory · Mathematics 2020-10-20 Sai Li

Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap

Residual bootstrap is a classical method for statistical inference in regression settings. With massive data sets becoming increasingly common, there is a demand for computationally efficient alternatives to residual bootstrap. We propose a…

Methodology · Statistics 2024-09-30 Indrila Ganguly , Srijan Sengupta , Sujit Ghosh

Bootstrap Methods in Econometrics

The bootstrap is a method for estimating the distribution of an estimator or test statistic by re-sampling the data or a model estimated from the data. Under conditions that hold in a wide variety of econometric applications, the bootstrap…

Econometrics · Economics 2018-09-12 Joel L. Horowitz

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

Bayesian Bootstraps for Massive Data

In this article, we present data-subsetting algorithms that allow for the approximate and scalable implementation of the Bayesian bootstrap. They are analogous to two existing algorithms in the frequentist literature: the bag of little…

Computation · Statistics 2019-03-25 Andrés F. Barrientos , Víctor Peña

Subsampling (weighted smooth) empirical copula processes

A key tool to carry out inference on the unknown copula when modeling a continuous multivariate distribution is a nonparametric estimator known as the empirical copula. One popular way of approximating its sampling distribution consists of…

Statistics Theory · Mathematics 2023-02-01 Ivan Kojadinovic , Kristina Stemikovskaya

On the Properties of Simulation-based Estimators in High Dimensions

Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase…

Statistics Theory · Mathematics 2018-10-12 Stéphane Guerrier , Mucyo Karemera , Samuel Orso , Maria-Pia Victoria-Feser

Weighted Bayesian Bootstrap for Scalable Bayes

We develop a weighted Bayesian Bootstrap (WBB) for machine learning and statistics. WBB provides uncertainty quantification by sampling from a high dimensional posterior distribution. WBB is computationally fast and scalable using only…

Methodology · Statistics 2021-04-06 Michael Newton , Nicholas G. Polson , Jianeng Xu