Related papers: On optimal spatial subsample size for variance est…
For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…
In this paper, we investigate the randomized algorithms for block matrix multiplication from random sampling perspective. Based on the A-optimal design criterion, the optimal sampling probabilities and sampling block sizes are obtained. To…
We establish a general theory of optimality for block bootstrap distribution estimation for sample quantiles under a mild strong mixing assumption. In contrast to existing results, we study the block bootstrap for varying numbers of blocks.…
The Birnbaum-Saunders distribution has been widely applied in several areas of science and although several methodologies related to this distribution have been proposed, the problem of determining the optimal sample size for estimating its…
The block maximum method, which is widely used in extreme value analysis, uses a generalized extreme value distribution to approximate that of the maximum of m observations. The quality of this approximation depends on the value of m and…
We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on…
We consider the minimization of a sum of an expectation-valued coordinate-wise $L_i$-smooth nonconvex function and a nonsmooth block-separable convex regularizer. We propose an asynchronous variance-reduced algorithm, where in each…
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…
An optimal mesh size of the sampling region can help to reduce computational burden in practical applications. In this work, we investigate optimal choices of mesh sizes for the identifications of medium obstacles from either the far-field…
We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent (a.k.a. least-mean-squares). In the strongly-convex case, we provide…
For optimization on large-scale data, exactly calculating its solution may be computationally difficulty because of the large size of the data. In this paper we consider subsampled optimization for fast approximating the exact solution. In…
A theory of superefficiency and adaptation is developed under flexible performance measures which give a multiresolution view of risk and bridge the gap between pointwise and global estimation. This theory provides a useful benchmark for…
In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…
We investigate the accuracy of two general non-parametric methods for estimating optimal block lengths for block bootstraps with time series - the first proposed in the seminal paper of Hall, Horowitz and Jing (Biometrika 82 (1995) 561-574)…
We consider batch size selection for a general class of multivariate batch means variance estimators, which are computationally viable for high-dimensional Markov chain Monte Carlo simulations. We derive the asymptotic mean squared error…
The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…
Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$,…
We consider the problem of approximating an unknown function from point evaluations. This problem is a crucial subproblem in many modern (nonlinear) approximation schemes. When obtaining these point evaluations is costly, minimising the…
Block-based resampling estimators have been intensively investigated for weakly dependent time processes, which has helped to inform implementation (e.g., best block sizes). However, little is known about resampling performance and block…
We study the probabilistic sampling of a random variable, in which the variable is sampled only if it falls outside a given set, which is called the silence set. This helps us to understand optimal event-based sampling for the special case…