Related papers: Resampling Methods with Imputed Data
Statistical resampling methods have become feasible for parametric estimation, hypothesis testing, and model validation now that the computer is a ubiquitous tool for statisticians. This essay focuses on the resampling technique for…
We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight…
Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…
Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is non-symmetric. It remains however unclear how to…
We provide computationally attractive methods to obtain jackknife-based cluster-robust variance matrix estimators (CRVEs) for linear regression models estimated by least squares. We also propose several new variants of the wild cluster…
Multiple imputation is a highly recommended technique to deal with missing data, but the application to longitudinal datasets can be done in multiple ways. When a new wave of longitudinal data arrives, we can treat the combined data of…
The bootstrap is a versatile inference method that has proven powerful in many statistical problems. However, when applied to modern large-scale models, it could face substantial computation demand from repeated data resampling and model…
Bootstrap is a useful tool for making statistical inference, but it may provide erroneous results under complex survey sampling. Most studies about bootstrap-based inference are developed under simple random sampling and stratified random…
In non-linear estimations, it is common to assess sampling uncertainty by bootstrap inference. For complex models, this can be computationally intensive. This paper combines optimization with resampling: turning stochastic optimization into…
Mixture models are a popular tool in model-based clustering. Such a model is often fitted by a procedure that maximizes the likelihood, such as the EM algorithm. At convergence, the maximum likelihood parameter estimates are typically…
We present correction terms that allow delete-one Jackknife and Bootstrap methods to be used to recover unbiased estimates of the data covariance matrix of the two-point correlation function $\xi\left(\mathbf{r}\right)$. We demonstrate the…
Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…
Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series…
Presence of missing values in a dataset can adversely affect the performance of a classifier. Single and Multiple Imputation are normally performed to fill in the missing values. In this paper, we present several variants of combining…
The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We…
Bootstrap techniques (also called resampling computation techniques) have introduced new advances in modeling and model evaluation. Using resampling methods to construct a series of new samples which are based on the original data set,…
Multiple imputation has become one of the most popular approaches for handling missing data in statistical analyses. Part of this success is due to Rubin's simple combination rules. These give frequentist valid inferences when the…
In this paper we describe two bootstrap methods for massive data sets. Naive applications of common resampling methodology are often impractical for massive data sets due to computational burden and due to complex patterns of inhomogeneity.…
Compressed sensing proposes to reconstruct more degrees of freedom in a signal than the number of values actually measured. Compressed sensing therefore risks introducing errors -- inserting spurious artifacts or masking the abnormalities…
While widely used as a general method for uncertainty quantification, the bootstrap method encounters difficulties that raise concerns about its validity in practical applications. This paper introduces a new resampling-based method, termed…