Related papers: Optimal subsampling for functional composite quant…
Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…
For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from…
We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the…
Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over…
Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…
Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…
Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…
Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…
For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…
Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…
Subsampling is commonly used to overcome computational and economical bottlenecks in the analysis of finite populations and massive datasets. Existing methods are often limited in scope and use optimality criteria (e.g., A-optimality) with…
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…
To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the…
In this paper, we study the functional linear multiplicative model based on the least product relative error criterion. Under some regularization conditions, we establish the consistency and asymptotic normality of the estimator. Further,…
Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…
A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…
Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider multiple linear regression for an extraordinarily large number of…
Subsampling is a popular approach to alleviating the computational burden for analyzing massive datasets. Recent efforts have been devoted to various statistical models without explicit regularization. In this paper, we develop an efficient…
Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…