Related papers: Multi-resolution subsampling for large-scale linea…

A subsampling approach for large data sets when the Generalised Linear Model is potentially misspecified

Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…

Methodology · Statistics 2025-10-08 Amalan Mahendran , Helen Thompson , James M. McGree

A model robust sub-sampling approach for Generalised Linear Models in Big data settings

In today's modern era of Big data, computationally efficient and scalable methods are needed to support timely insights and informed decision making. One such method is sub-sampling, where a subset of the Big data is analysed and used as…

Methodology · Statistics 2022-09-07 Amalan Mahendran , Helen Thompson , James M. McGree

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…

Methodology · Statistics 2021-05-05 Tao Li , Cheng Meng

Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be…

Computation · Statistics 2018-09-14 Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

Balanced Subsampling for Big Data with Categorical Covariates

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Optimal subsampling for functional composite quantile regression in massive data

As computer resources become increasingly limited, traditional statistical methods face challenges in analyzing massive data, especially in functional data analysis. To address this issue, subsampling offers a viable solution by…

Methodology · Statistics 2024-07-01 Jingxiang Pan , Xiaohui Yuan , Xiaohui Yuan

Optimal subsampling algorithm for the marginal model with large longitudinal data

Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the…

Methodology · Statistics 2023-11-16 Haohui Han , Liya Fu

Statistical Undersampling with Mutual Information and Support Points

Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work…

Machine Learning · Statistics 2024-12-20 Alex Mak , Shubham Sahoo , Shivani Pandey , Yidan Yue , Linglong Kong

D-optimal Subsampling Design for Massive Data Linear Regression

Data reduction is a fundamental challenge of modern technology, where classical statistical methods are not applicable because of computational limitations. We consider multiple linear regression for an extraordinarily large number of…

Methodology · Statistics 2025-05-30 Torsten Glemser , Rainer Schwabe

Maximum-Variance-Reduction Stratification for Improved Subsampling

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize…

Methodology · Statistics 2026-05-07 Dingyi Wang , Haiying Wang , Qingpei Hu

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Optimal Downsampling for Imbalanced Classification with Generalized Linear Models

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…

Machine Learning · Statistics 2025-05-20 Yan Chen , Jose Blanchet , Krzysztof Dembczynski , Laura Fee Nern , Aaron Flores

Optimal Subsampling for Large Sample Logistic Regression

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data

Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…

Methodology · Statistics 2021-07-06 Jun Yu , HaiYing Wang , Mingyao Ai , Huiming Zhang

Optimal Sub-sampling with Influence Functions

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…

Machine Learning · Statistics 2017-09-07 Daniel Ting , Eric Brochu

Optimal subsampling for functional quantile regression

Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…

Numerical Analysis · Mathematics 2022-05-06 Qian Yan , Hanyu Li , Chengmei Niu

Optimal Subsampling Bootstrap for Massive Data

The bootstrap is a widely used procedure for statistical inference because of its simplicity and attractive statistical properties. However, the vanilla version of bootstrap is no longer feasible computationally for many modern massive…

Methodology · Statistics 2023-02-16 Yingying Ma , Chenlei Leng , Hansheng Wang

Subsampling for Big Data Linear Models with Measurement Errors

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu