Related papers: Why stratification may hurt, & how much
Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…
The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…
The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…
Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…
A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…
In this paper we examine quantile-stratified samples from a known univariate probability distribution, with stratification occurring over a partition of the quantile regions in the distribution. We examine some general properties of this…
Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may…
We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may…
In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These…
Some improved estimators are proposed for estimating the population mean in stratified sampling in the presence of auxiliary information. Mean square error (MSE) of the proposed estimators have been derived under large sample approximation.…
This work considers the allocation problem for multivariate stratified random sampling as a problem of integer non-linear stochastic multiobjective mathematical programming. With this goal in mind the asymptotic distribution of the vector…
We consider the problem of choosing the best of $n$ samples, out of a large random pool, when the sampling of each member is associated with a certain cost. The quality (worth) of the best sample clearly increases with $n$, but so do the…
The past two decades have witnessed a surge of new research in the analysis of randomized experiments. The emergence of this literature may seem surprising given the widespread use and long history of experiments as the "gold standard" in…
Classifier calibration does not always go hand in hand with the classifier's ability to separate the classes. There are applications where good classifier calibration, i.e. the ability to produce accurate probability estimates, is more…
Science and engineering problems subject to uncertainty are frequently both computationally expensive and feature nonsmooth parameter dependence, making standard Monte Carlo too slow, and excluding efficient use of accelerated uncertainty…
The problem of optimal allocation of samples in surveys using a stratified sampling plan was first discussed by Neyman in 1934. Since then, many researchers have studied the problem of the sample allocation in multivariate surveys and…
This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment,…
Granular materials size segregate when exposed to external periodic perturbations such as vibrations. Moreover, mixtures of grains of different sizes spontaneously segregate in the absence of external perturbations: when a mixture is simply…
In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally…
Post-stratification is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey. In natural resource surveys, such information is often obtained from…