Related papers: Optimal Subsampling for Large Sample Logistic Regr…
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…
For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from…
To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the…
Subsampling is a popular approach to alleviating the computational burden for analyzing massive datasets. Recent efforts have been devoted to various statistical models without explicit regularization. In this paper, we develop an efficient…
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…
Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…
Datasets with sheer volume have been generated from fields including computer vision, medical imageology, and astronomy whose large-scale and high-dimensional properties hamper the implementation of classical statistical models. To tackle…
In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…
In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the…
With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling…
Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…
Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than…
We investigate the feature compression of high-dimensional ridge regression using the optimal subsampling technique. Specifically, based on the basic framework of random sampling algorithm on feature for ridge regression and the A-optimal…
Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…
We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the…
For optimization on large-scale data, exactly calculating its solution may be computationally difficulty because of the large size of the data. In this paper we consider subsampled optimization for fast approximating the exact solution. In…
As computer resources become increasingly limited, traditional statistical methods face challenges in analyzing massive data, especially in functional data analysis. To address this issue, subsampling offers a viable solution by…
Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…
A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…
Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…