Related papers: Poisson Subsampling Algorithms for Large Sample Li…

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency,…

Statistics Theory · Mathematics 2022-05-19 Jing Wang , Jiahui Zou , HaiYing Wang

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…

Methodology · Statistics 2021-07-06 Jun Yu , HaiYing Wang , Mingyao Ai , Huiming Zhang

A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

Optimal Subsampling for Large Sample Ridge Regression

Subsampling is a popular approach to alleviating the computational burden for analyzing massive datasets. Recent efforts have been devoted to various statistical models without explicit regularization. In this paper, we develop an efficient…

Methodology · Statistics 2022-04-12 Yunlu Chen , Nan Zhang

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

Subsampled One-Step Estimation for Fast Statistical Inference

Subsampling is an effective approach to alleviate the computational burden associated with large-scale datasets. Nevertheless, existing subsampling estimators incur a substantial loss in estimation efficiency compared to estimators based on…

Methodology · Statistics 2025-09-25 Miaomiao Su , Ruoyu Wang

More Efficient Estimation for Logistic Regression with Optimal Subsample

In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the…

Methodology · Statistics 2021-06-24 HaiYing Wang

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…

Methodology · Statistics 2021-05-05 Tao Li , Cheng Meng

Subsampled Optimization: Statistical Guarantees, Mean Squared Error Approximation, and Sampling Method

For optimization on large-scale data, exactly calculating its solution may be computationally difficulty because of the large size of the data. In this paper we consider subsampled optimization for fast approximating the exact solution. In…

Machine Learning · Statistics 2018-04-11 Rong Zhu , Jiming Jiang

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

Optional subsampling for generalized estimating equations in growing-dimensional longitudinal Data

As a powerful tool for longitudinal data analysis, the generalized estimating equations have been widely studied in the academic community. However, in large-scale settings, this approach faces pronounced computational and storage…

Computation · Statistics 2025-08-29 Chunjing Li , Jiahui Zhang , Xiaohui Yuan

Multi-resolution subsampling for large-scale linear classification

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

Nonsingular subsampling for S-estimators with categorical predictors

An integral part of many algorithms for S-estimators of linear regression is random subsampling. For problems with only continuous predictors simple random subsampling is a reliable method to generate initial coefficient estimates that can…

Computation · Statistics 2012-08-29 Manuel Koller

Optimal subsampling for quantile regression in big data

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the…

Computation · Statistics 2020-01-29 HaiYing Wang , Yanyuan Ma

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

Poisson Regression in one Covariate on Massive Data

The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally $ D $-optimal subsampling designs under a Poisson regression model with a…

Statistics Theory · Mathematics 2024-03-28 Torsten Reuter , Rainer Schwabe

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in…

Machine Learning · Computer Science 2024-10-10 Jiawei Yao , Chuming Li , Canran Xiao