English
Related papers

Related papers: Improving optimal subsampling through stratificati…

200 papers

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These…

Methodology · Statistics 2007-12-04 Pierre Etore , Benjamin Jourdain

This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the…

Probability · Mathematics 2009-09-15 Pierre Etoré , Gersende Fort , Benjamin Jourdain , Eric Moulines

This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment,…

Econometrics · Economics 2023-08-22 Max Cytrynbaum

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize…

Methodology · Statistics 2026-05-07 Dingyi Wang , Haiying Wang , Qingpei Hu

In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally…

Methodology · Statistics 2017-11-02 Chandler Zuo

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where…

Computation · Statistics 2019-06-27 HaiYing Wang , Rong Zhu , Ping Ma

The statistical efficiency of randomized clinical trials can be improved by incorporating information from baseline covariates (i.e., pre-treatment patient characteristics). This can be done in the design stage using stratified (permutated…

Methodology · Statistics 2025-02-04 Zhiwei Zhang

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning, often leading to biased models and poor predictive performance for minority classes. This work…

Machine Learning · Statistics 2024-12-20 Alex Mak , Shubham Sahoo , Shivani Pandey , Yidan Yue , Linglong Kong

In many randomized trials, outcomes such as essays or open-ended responses must be manually scored as a preliminary step to impact analysis, a process that is costly and limiting. Model-assisted estimation offers a way to combine surrogate…

Methodology · Statistics 2026-02-16 Reagan Mozer , Nicole E. Pashley , Luke Miratrix

Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for…

Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…

Methodology · Statistics 2021-05-05 Tao Li , Cheng Meng

A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…

Methodology · Statistics 2022-06-03 Raphaël Jauslin , Esther Eustache , Yves Tillé

Subsampling algorithms for various parametric regression models with massive data have been extensively investigated in recent years. However, all existing studies on subsampling heavily rely on clean massive data. In practical…

Statistics Theory · Mathematics 2025-06-11 Jiangshan Ju , Mingqiu Wang , Shengli Zhao

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information…

Methodology · Statistics 2024-07-10 Haolin Chen , Holger Dette , Jun Yu

In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the…

Optimization and Control · Mathematics 2017-11-01 Raghu Bollapragada , Richard Byrd , Jorge Nocedal

Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…

Numerical Analysis · Mathematics 2022-05-06 Qian Yan , Hanyu Li , Chengmei Niu

Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…

Machine Learning · Statistics 2025-05-20 Yan Chen , Jose Blanchet , Krzysztof Dembczynski , Laura Fee Nern , Aaron Flores

Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…

Machine Learning · Statistics 2017-09-07 Daniel Ting , Eric Brochu
‹ Prev 1 2 3 10 Next ›