Related papers: A subsampled double bootstrap for massive data

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets---which are increasingly prevalent---the computation of bootstrap-based quantities can be prohibitively…

Methodology · Statistics 2012-06-29 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael I. Jordan

The Big Data Bootstrap

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we…

Machine Learning · Computer Science 2012-07-03 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael Jordan

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling methods like BLB (Bag of Little Bootstraps) serves as powerful tools for assessing the quality of estimators for massive data. However, the performance of the subsampling…

Methodology · Statistics 2022-01-14 Yingying Ma , Hansheng Wang

A Cheap Bootstrap Method for Fast Inference

The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…

Methodology · Statistics 2022-02-02 Henry Lam

Simultaneous Inference for Massive Data: Distributed Bootstrap

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling,…

Machine Learning · Statistics 2020-02-21 Yang Yu , Shih-Kang Chao , Guang Cheng

Bootstrap in High Dimension with Low Computation

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We…

Methodology · Statistics 2023-06-21 Henry Lam , Zhenyuan Liu

Cheap Subsampling bootstrap confidence intervals for fast and robust inference

Bootstrapping is often applied to get confidence limits for semiparametric inference of a target parameter in the presence of nuisance parameters. Bootstrapping with replacement can be computationally expensive and problematic when…

Methodology · Statistics 2025-03-06 Johan Sebastian Ohlendorff , Anders Munch , Kathrine Kold Sørensen , Thomas Alexander Gerds

Bayesian Bootstraps for Massive Data

In this article, we present data-subsetting algorithms that allow for the approximate and scalable implementation of the Bayesian bootstrap. They are analogous to two existing algorithms in the frequentist literature: the bag of little…

Computation · Statistics 2019-03-25 Andrés F. Barrientos , Víctor Peña

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little…

Methodology · Statistics 2020-04-21 Xuejun Ma , Shaochen Wang , Wang Zhou

Variable Selection with Scalable Bootstrap in Generalized Linear Model for Massive Data

Bootstrap is commonly used as a tool for non-parametric statistical inference to estimate meaningful parameters in Variable Selection Models. However, for massive dataset that has exponential growth rate, the computation of Bootstrap…

Computation · Statistics 2016-12-26 Zhibing He , Yichen Qin , Ben-Chang Shia , Yang Li

Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap

Residual bootstrap is a classical method for statistical inference in regression settings. With massive data sets becoming increasingly common, there is a demand for computationally efficient alternatives to residual bootstrap. We propose a…

Methodology · Statistics 2024-09-30 Indrila Ganguly , Srijan Sengupta , Sujit Ghosh

Scalable Efficient Inference in Complex Surveys through Targeted Resampling of Weights

Survey data often arises from complex sampling designs, such as stratified or multistage sampling, with unequal inclusion probabilities. When sampling is informative, traditional inference methods yield biased estimators and poor coverage.…

Methodology · Statistics 2025-04-17 Snigdha Das , Dipankar Bandyopadhyay , Debdeep Pati

Gap bootstrap methods for massive data sets with an application to transportation engineering

In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity.…

Applications · Statistics 2013-01-14 S. N. Lahiri , C. Spiegelman , J. Appiah , L. Rilett

An Online Bootstrap for Time Series

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series…

Machine Learning · Statistics 2024-02-28 Nicolai Palm , Thomas Nagler

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

Bootstrap Methods in Econometrics

The bootstrap is a method for estimating the distribution of an estimator or test statistic by re-sampling the data or a model estimated from the data. Under conditions that hold in a wide variety of econometric applications, the bootstrap…

Econometrics · Economics 2018-09-12 Joel L. Horowitz

Subsampling (weighted smooth) empirical copula processes

A key tool to carry out inference on the unknown copula when modeling a continuous multivariate distribution is a nonparametric estimator known as the empirical copula. One popular way of approximating its sampling distribution consists of…

Statistics Theory · Mathematics 2023-02-01 Ivan Kojadinovic , Kristina Stemikovskaya

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap is a useful tool for making statistical inference, but it may provide erroneous results under complex survey sampling. Most studies about bootstrap-based inference are developed under simple random sampling and stratified random…

Statistics Theory · Mathematics 2019-01-08 Zhonglei Wang , Jae Kwang Kim , Liuhua Peng