Related papers: More Efficient Estimation for Logistic Regression …

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…

Methodology · Statistics 2021-07-06 Jun Yu , HaiYing Wang , Mingyao Ai , Huiming Zhang

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency,…

Statistics Theory · Mathematics 2022-05-19 Jing Wang , Jiahui Zou , HaiYing Wang

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Unweighted estimation based on optimal sample under measurement constraints

To tackle massive data, subsampling is a practical approach to select the more informative data points. However, when responses are expensive to measure, developing efficient subsampling schemes is challenging, and an optimal sampling…

Computation · Statistics 2022-10-11 Jing Wang , HaiYing Wang , Shifeng Xiong

Optimal subsampling for functional quantile regression

Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…

Numerical Analysis · Mathematics 2022-05-06 Qian Yan , Hanyu Li , Chengmei Niu

Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data

Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than…

Machine Learning · Statistics 2015-11-24 Rong Zhu

Optional subsampling for generalized estimating equations in growing-dimensional longitudinal Data

As a powerful tool for longitudinal data analysis, the generalized estimating equations have been widely studied in the academic community. However, in large-scale settings, this approach faces pronounced computational and storage…

Computation · Statistics 2025-08-29 Chunjing Li , Jiahui Zhang , Xiaohui Yuan

Optimal Subsampling Algorithms for Big Data Regressions

To fast approximate maximum likelihood estimators with massive data, this paper studies the Optimal Subsampling Method under the A-optimality Criterion (OSMAC) for generalized linear models. The consistency and asymptotic normality of the…

Methodology · Statistics 2021-06-15 Mingyao Ai , Jun Yu , Huiming Zhang , HaiYing Wang

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

Poisson Regression in one Covariate on Massive Data

The goal of subsampling is to select an informative subset of all observations, when using the full data for statistical analysis is not viable. We construct locally $ D $-optimal subsampling designs under a Poisson regression model with a…

Statistics Theory · Mathematics 2024-03-28 Torsten Reuter , Rainer Schwabe

Asymptotically Optimal Bias Reduction for Parametric Models

An important challenge in statistical analysis concerns the control of the finite sample bias of estimators. This problem is magnified in high-dimensional settings where the number of variables $p$ diverges with the sample size $n$, as well…

Statistics Theory · Mathematics 2020-02-21 Stéphane Guerrier , Mucyo Karemera , Samuel Orso , Maria-Pia Victoria-Feser

A Provably Accurate Randomized Sampling Algorithm for Logistic Regression

In statistics and machine learning, logistic regression is a widely-used supervised learning technique primarily employed for binary classification tasks. When the number of observations greatly exceeds the number of predictor variables, we…

Machine Learning · Statistics 2024-04-02 Agniva Chowdhury , Pradeep Ramuhalli

Approximating Partial Likelihood Estimators via Optimal Subsampling

With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling…

Methodology · Statistics 2023-05-18 Haixiang Zhang , Lulu Zuo , HaiYing Wang , Liuquan Sun

Optimal subsampling for quantile regression in big data

We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the…

Computation · Statistics 2020-01-29 HaiYing Wang , Yanyuan Ma

An efficient stochastic Newton algorithm for parameter estimation in logistic regressions

Logistic regression is a well-known statistical model which is commonly used in the situation where the output is a binary random variable. It has a wide range of applications including machine learning, public health, social sciences,…

Statistics Theory · Mathematics 2019-04-18 Bernard Bercu , Antoine Godichon-Baggioni , Bruno Portier

Improving optimal subsampling through stratification

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw

Local case-control sampling: Efficient subsampling in imbalanced data sets

For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic…

Computation · Statistics 2014-09-24 William Fithian , Trevor Hastie

Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…

Machine Learning · Statistics 2025-05-20 Yan Chen , Jose Blanchet , Krzysztof Dembczynski , Laura Fee Nern , Aaron Flores

Optimal Subsampling for Large Sample Ridge Regression

Subsampling is a popular approach to alleviating the computational burden for analyzing massive datasets. Recent efforts have been devoted to various statistical models without explicit regularization. In this paper, we develop an efficient…

Methodology · Statistics 2022-04-12 Yunlu Chen , Nan Zhang