Related papers: Robust, scalable and fast bootstrap method for ana…

A subsampled double bootstrap for massive data

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation and its…

Methodology · Statistics 2015-08-06 Srijan Sengupta , Stanislav Volgushev , Xiaofeng Shao

A Scalable Bootstrap for Massive Data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets---which are increasingly prevalent---the computation of bootstrap-based quantities can be prohibitively…

Methodology · Statistics 2012-06-29 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael I. Jordan

The Big Data Bootstrap

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets, the computation of bootstrap-based quantities can be prohibitively demanding. As an alternative, we…

Machine Learning · Computer Science 2012-07-03 Ariel Kleiner , Ameet Talwalkar , Purnamrita Sarkar , Michael Jordan

Two-Stage Robust and Sparse Distributed Statistical Inference for Large-Scale Data

In this paper, we address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers. The high volume and dimensionality of the data require distributed…

Machine Learning · Statistics 2022-11-30 Emadaldin Mozafari-Majd , Visa Koivunen

A Cheap Bootstrap Method for Fast Inference

The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…

Methodology · Statistics 2022-02-02 Henry Lam

Scalable Resampling in Massive Generalized Linear Models via Subsampled Residual Bootstrap

Residual bootstrap is a classical method for statistical inference in regression settings. With massive data sets becoming increasingly common, there is a demand for computationally efficient alternatives to residual bootstrap. We propose a…

Methodology · Statistics 2024-09-30 Indrila Ganguly , Srijan Sengupta , Sujit Ghosh

Bootstrap in High Dimension with Low Computation

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We…

Methodology · Statistics 2023-06-21 Henry Lam , Zhenyuan Liu

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little…

Methodology · Statistics 2020-04-21 Xuejun Ma , Shaochen Wang , Wang Zhou

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

Variable Selection with Scalable Bootstrap in Generalized Linear Model for Massive Data

Bootstrap is commonly used as a tool for non-parametric statistical inference to estimate meaningful parameters in Variable Selection Models. However, for massive dataset that has exponential growth rate, the computation of Bootstrap…

Computation · Statistics 2016-12-26 Zhibing He , Yichen Qin , Ben-Chang Shia , Yang Li

Simultaneous Inference for Massive Data: Distributed Bootstrap

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling,…

Machine Learning · Statistics 2020-02-21 Yang Yu , Shih-Kang Chao , Guang Cheng

Scalable inference in functional linear regression with streaming data

Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible. In addition,…

Methodology · Statistics 2023-10-11 Jinhan Xie , Enze Shi , Peijun Sang , Zuofeng Shang , Bei Jiang , Linglong Kong

Finite Sample Valid Inference via Calibrated Bootstrap

While widely used as a general method for uncertainty quantification, the bootstrap method encounters difficulties that raise concerns about its validity in practical applications. This paper introduces a new resampling-based method, termed…

Methodology · Statistics 2024-08-30 Yiran Jiang , Chuanhai Liu , Heping Zhang

A Random Sample Partition Data Model for Big Data Analysis

Big data sets must be carefully partitioned into statistically similar data subsets that can be used as representative samples for big data analysis tasks. In this paper, we propose the random sample partition (RSP) data model to represent…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-11 Salman Salloum , Yulin He , Joshua Zhexue Huang , Xiaoliang Zhang , Tamer Z. Emara , Chenghao Wei , Heping He

Gap bootstrap methods for massive data sets with an application to transportation engineering

In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity.…

Applications · Statistics 2013-01-14 S. N. Lahiri , C. Spiegelman , J. Appiah , L. Rilett

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a…

Methodology · Statistics 2022-06-15 Yang Yu , Shih-Kang Chao , Guang Cheng

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Assessing sampling uncertainty in extremum estimation can be challenging when the asymptotic variance is not analytically tractable. Bootstrap inference offers a feasible solution but can be computationally costly especially when the model…

Econometrics · Economics 2020-09-15 Jean-Jacques Forneron , Serena Ng

An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression

This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of…

Machine Learning · Computer Science 2023-08-25 Bohan Lyu , Jianzhong Li

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap is a useful tool for making statistical inference, but it may provide erroneous results under complex survey sampling. Most studies about bootstrap-based inference are developed under simple random sampling and stratified random…

Statistics Theory · Mathematics 2019-01-08 Zhonglei Wang , Jae Kwang Kim , Liuhua Peng

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As…

Methodology · Statistics 2026-03-17 Matthew Kosko , Falco J , Bargagli-Stoffi , Lin Wang , Michele Santacatterina