Related papers: Subsampled One-Step Estimation for Fast Statistica…

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than…

Machine Learning · Statistics 2015-11-24 Rong Zhu

Approximating Partial Likelihood Estimators via Optimal Subsampling

With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling…

Methodology · Statistics 2023-05-18 Haixiang Zhang , Lulu Zuo , HaiYing Wang , Liuquan Sun

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

A Distributed One-Step Estimator

Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the…

Methodology · Statistics 2015-11-11 Cheng Huang , Xiaoming Huo

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

A Moment-assisted Approach for Improving Subsampling-based MLE with Large-scale data

The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it often results in efficiency loss.This…

Methodology · Statistics 2026-04-27 Miaomiao Su , Qihua Wang , Ruoyu Wang

Optimal Subsampling Algorithms for Big Data Regressions

To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the…

Methodology · Statistics 2021-06-15 Mingyao Ai , Jun Yu , Huiming Zhang , HaiYing Wang

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

More Efficient Estimation for Logistic Regression with Optimal Subsample

In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the…

Methodology · Statistics 2021-06-24 HaiYing Wang

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency,…

Statistics Theory · Mathematics 2022-05-19 Jing Wang , Jiahui Zou , HaiYing Wang

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size $N$, $m_N$ subsamples are randomly drawn, and each…

Methodology · Statistics 2021-03-05 Tao Zou , Xian Li , Xuan Liang , Hansheng Wang

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Single-Index Model-Assisted Estimation In Survey Sampling

A model-assisted semiparametric method of estimating finite population totals is investigated to improve the precision of survey estimators by incorporating multivariate auxiliary information. The proposed superpopulation model is a…

Methodology · Statistics 2019-03-19 Lily Wang

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…

Methodology · Statistics 2021-07-06 Jun Yu , HaiYing Wang , Mingyao Ai , Huiming Zhang

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method

For optimization on large-scale data, exactly calculating its solution may be computationally difficulty because of the large size of the data. In this paper we consider subsampled optimization for fast approximating the exact solution. In…

Machine Learning · Statistics 2018-04-11 Rong Zhu , Jiming Jiang

Moment-assisted subsampling method for Cox proportional hazards model with large-scale data

The Cox proportional hazards model is widely used in survival analysis to model time-to-event data. However, it faces significant computational challenges in the era of large-scale data, particularly when dealing with time-dependent…

Methodology · Statistics 2025-01-14 Miaomiao Su , Ruoyu Wang

One-Step Estimation With Scaled Proximal Methods

We study statistical estimators computed using iterative optimization methods that are not run until completion. Classical results on maximum likelihood estimators (MLEs) assert that a one-step estimator (OSE), in which a single…

Optimization and Control · Mathematics 2021-06-28 Robert Bassett , Julio Deride