English
Related papers

Related papers: Why stratification may hurt, & how much

200 papers

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw

The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…

Statistics Theory · Mathematics 2019-03-26 Dominik Sieradzki , Wojciech Zieliński

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…

Data Structures and Algorithms · Computer Science 2021-07-19 Otmar Ertl

A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…

Methodology · Statistics 2022-06-03 Raphaël Jauslin , Esther Eustache , Yves Tillé

In this paper we examine quantile-stratified samples from a known univariate probability distribution, with stratification occurring over a partition of the quantile regions in the distribution. We examine some general properties of this…

Methodology · Statistics 2025-09-09 Ben O'Neill

Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may…

Machine Learning · Computer Science 2019-11-18 Luke Oakden-Rayner , Jared Dunnmon , Gustavo Carneiro , Christopher Ré

We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may…

Databases · Computer Science 2020-01-01 Brett Walenz , Stavros Sintos , Sudeepa Roy , Jun Yang

In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These…

Methodology · Statistics 2007-12-04 Pierre Etore , Benjamin Jourdain

Some improved estimators are proposed for estimating the population mean in stratified sampling in the presence of auxiliary information. Mean square error (MSE) of the proposed estimators have been derived under large sample approximation.…

Statistics Theory · Mathematics 2013-09-13 Rajesh Singh , Viplav K. Singh , A. A. Adewara

This work considers the allocation problem for multivariate stratified random sampling as a problem of integer non-linear stochastic multiobjective mathematical programming. With this goal in mind the asymptotic distribution of the vector…

Methodology · Statistics 2011-06-07 Jose A. Diaz-Garcia , Rogelio Ramos-Quiroga

We consider the problem of choosing the best of $n$ samples, out of a large random pool, when the sampling of each member is associated with a certain cost. The quality (worth) of the best sample clearly increases with $n$, but so do the…

Statistics Theory · Mathematics 2015-06-16 Joseph D. Skufca , Daniel ben-Avraham

The past two decades have witnessed a surge of new research in the analysis of randomized experiments. The emergence of this literature may seem surprising given the widespread use and long history of experiments as the "gold standard" in…

Econometrics · Economics 2025-04-03 Yuehao Bai , Azeem M. Shaikh , Max Tabord-Meehan

Classifier calibration does not always go hand in hand with the classifier's ability to separate the classes. There are applications where good classifier calibration, i.e. the ability to produce accurate probability estimates, is more…

Machine Learning · Computer Science 2020-05-26 Tuomo Alasalmi , Jaakko Suutala , Heli Koskimäki , Juha Röning

Science and engineering problems subject to uncertainty are frequently both computationally expensive and feature nonsmooth parameter dependence, making standard Monte Carlo too slow, and excluding efficient use of accelerated uncertainty…

Numerical Analysis · Mathematics 2021-10-01 Per Pettersson , Sebastian Krumscheid

The problem of optimal allocation of samples in surveys using a stratified sampling plan was first discussed by Neyman in 1934. Since then, many researchers have studied the problem of the sample allocation in multivariate surveys and…

Discrete Mathematics · Computer Science 2013-09-25 Jose Andre de Moura Brito , Gustavo Silva Semaan , Pedro Luis do Nascimento Silva , Nelson Maculan

This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment,…

Econometrics · Economics 2023-08-22 Max Cytrynbaum

Granular materials size segregate when exposed to external periodic perturbations such as vibrations. Moreover, mixtures of grains of different sizes spontaneously segregate in the absence of external perturbations: when a mixture is simply…

Statistical Mechanics · Physics 2015-06-25 Hernan A. Makse , Shlomo Havlin , Peter R. King , H. Eugene Stanley

In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally…

Methodology · Statistics 2017-11-02 Chandler Zuo

Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from…

Statistics Theory · Mathematics 2008-12-18 F. Jay Breidt , Jean D. Opsomer
‹ Prev 1 2 3 10 Next ›