Related papers: Simultaneous Inference for Massive Data: Distribut…

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a…

Methodology · Statistics 2022-06-15 Yang Yu , Shih-Kang Chao , Guang Cheng

A subsampled double bootstrap for massive data

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its…

Methodology · Statistics 2015-08-06 Srijan Sengupta , Stanislav Volgushev , Xiaofeng Shao

A Cheap Bootstrap Method for Fast Inference

The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…

Methodology · Statistics 2022-02-02 Henry Lam

Communication-Efficient and Memory-Aware Parallel Bootstrapping using MPI

Bootstrapping is a powerful statistical resampling technique for estimating the sampling distribution of an estimator. However, its computational cost becomes prohibitive for large datasets or a high number of resamples. This paper presents…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Di Zhang

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

An Online Bootstrap for Time Series

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series…

Machine Learning · Statistics 2024-02-28 Nicolai Palm , Thomas Nagler

Distributed Statistical Inference for Massive Data

This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different…

Statistics Theory · Mathematics 2018-05-30 Song Xi Chen , Liuhua Peng

Robust, scalable and fast bootstrap method for analyzing large scale data

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We…

Methodology · Statistics 2016-04-20 Shahab Basiri , Esa Ollila , Visa Koivunen

Bayesian Bootstraps for Massive Data

In this article, we present data-subsetting algorithms that allow for the approximate and scalable implementation of the Bayesian bootstrap. They are analogous to two existing algorithms in the frequentist literature: the bag of little…

Computation · Statistics 2019-03-25 Andrés F. Barrientos , Víctor Peña

Iterative Distributed Multinomial Regression

This article introduces an iterative distributed computing estimator for the multinomial logistic regression model with large choice sets. Compared to the maximum likelihood estimator, the proposed iterative distributed estimator achieves…

Econometrics · Economics 2024-12-03 Yanqin Fan , Yigit Okar , Xuetao Shi

Gap bootstrap methods for massive data sets with an application to transportation engineering

In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity.…

Applications · Statistics 2013-01-14 S. N. Lahiri , C. Spiegelman , J. Appiah , L. Rilett

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little…

Methodology · Statistics 2020-04-21 Xuejun Ma , Shaochen Wang , Wang Zhou

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets---which are increasingly prevalent---the computation of bootstrap-based quantities can be prohibitively…

Methodology · Statistics 2012-06-29 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael I. Jordan

Bootstrap in High Dimension with Low Computation

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We…

Methodology · Statistics 2023-06-21 Henry Lam , Zhenyuan Liu

Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap

Residual bootstrap is a classical method for statistical inference in regression settings. With massive data sets becoming increasingly common, there is a demand for computationally efficient alternatives to residual bootstrap. We propose a…

Methodology · Statistics 2024-09-30 Indrila Ganguly , Srijan Sengupta , Sujit Ghosh

Spinning Fast Iterative Data Flows

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…

Databases · Computer Science 2012-08-02 Stephan Ewen , Kostas Tzoumas , Moritz Kaufmann , Volker Markl

Memory-efficient array redistribution through portable collective communication

Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-29 Norman A. Rink , Adam Paszke , Dimitrios Vytiniotis , Georg Stefan Schmid

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap is a useful tool for making statistical inference, but it may provide erroneous results under complex survey sampling. Most studies about bootstrap-based inference are developed under simple random sampling and stratified random…

Statistics Theory · Mathematics 2019-01-08 Zhonglei Wang , Jae Kwang Kim , Liuhua Peng

Cheap Subsampling bootstrap confidence intervals for fast and robust inference

Bootstrapping is often applied to get confidence limits for semiparametric inference of a target parameter in the presence of nuisance parameters. Bootstrapping with replacement can be computationally expensive and problematic when…

Methodology · Statistics 2025-03-06 Johan Sebastian Ohlendorff , Anders Munch , Kathrine Kold Sørensen , Thomas Alexander Gerds

A Higher-Order Correct Fast Moving-Average Bootstrap for Dependent Data

We develop and implement a novel fast bootstrap for dependent data. Our scheme is based on the i.i.d. resampling of the smoothed moment indicators. We characterize the class of parametric and semi-parametric estimation problems for which…

Methodology · Statistics 2022-01-19 Davide La Vecchia , Alban Moor , Olivier Scaillet