Related papers: Learning to Sample: Counting with Complex Queries

Re-Assessing the "Classify and Count" Quantification Method

Learning to quantify (a.k.a.\ quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of…

Machine Learning · Computer Science 2021-09-22 Alejandro Moreo , Fabrizio Sebastiani

On adaptive stratification

This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the…

Probability · Mathematics 2009-09-15 Pierre Etoré , Gersende Fort , Benjamin Jourdain , Eric Moulines

Stratified Random Sampling for Dependent Inputs

A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…

Methodology · Statistics 2019-11-25 Anirban Mondal , Abhijit Mandal

Improving optimal subsampling through stratification

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw

A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation

Model performance evaluation is a critical and expensive task in machine learning and computer vision. Without clear guidelines, practitioners often estimate model accuracy using a one-time completely random selection of the data. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Riccardo Fogliato , Pratik Patil , Mathew Monfort , Pietro Perona

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally…

Methodology · Statistics 2017-11-02 Chandler Zuo

Optimal Stratification of Survey Experiments

This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment,…

Econometrics · Economics 2023-08-22 Max Cytrynbaum

Simulation Model Calibration with Dynamic Stratification and Adaptive Sampling

Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently…

Methodology · Statistics 2024-07-17 Pranav Jain , Sara Shashaani , Eunshin Byon

SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized Data

State-of-the-art techniques of artificial intelligence, in particular deep learning, are mostly data-driven. However, collecting and manually labeling a large scale dataset is both difficult and expensive. A promising alternative is to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-31 Qi Chen , Weichao Qiu , Yi Zhang , Lingxi Xie , Alan Yuille

Sampling with Costs

We consider the problem of choosing the best of $n$ samples, out of a large random pool, when the sampling of each member is associated with a certain cost. The quality (worth) of the best sample clearly increases with $n$, but so do the…

Statistics Theory · Mathematics 2015-06-16 Joseph D. Skufca , Daniel ben-Avraham

Sampling Unknown Decision Functions to Build Classifier Copies

Copies have been proposed as a viable alternative to endow machine learning models with properties and features that adapt them to changing needs. A fundamental step of the copying process is generating an unlabelled set of points to…

Machine Learning · Computer Science 2019-10-02 Irene Unceta , Diego Palacios , Jordi Nin , Oriol Pujol

Subset Selection for Stratified Sampling in Online Controlled Experiments

Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for…

Computation · Statistics 2025-09-22 Haru Momozu , Yuki Uehara , Naoki Nishimura , Koya Ohashi , Deddy Jobson , Yilin Li , Phuong Dinh , Noriyoshi Sukegawa , Yuichi Takano

Estimation from Partially Sampled Distributed Traces

Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…

Data Structures and Algorithms · Computer Science 2021-07-19 Otmar Ertl

Cost Issue in Estimation of Proportion in a Finite Population Divided Among Two Strata

The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…

Statistics Theory · Mathematics 2019-03-26 Dominik Sieradzki , Wojciech Zieliński

Enhanced Cube Implementation For Highly Stratified Population

A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…

Methodology · Statistics 2022-06-03 Raphaël Jauslin , Esther Eustache , Yves Tillé

Experimental Analysis of a Generalized Stratified Sampling Algorithm for Hypercubes

Stratified sampling is a fast and simple method to generate point sets with uniform distribution in hypercubes. However, for the most common paraxial stratfication it has the prominent drawback that the number of sampled points in n…

Computation · Statistics 2018-06-14 Simon Wessing

Adaptive Threshold Sampling

Sampling is a fundamental problem in computer science and statistics. However, for a given task and stream, it is often not possible to choose good sampling probabilities in advance. We derive a general framework for adaptively changing the…

Machine Learning · Statistics 2022-06-16 Daniel Ting

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

Better Classifier Calibration for Small Data Sets

Classifier calibration does not always go hand in hand with the classifier's ability to separate the classes. There are applications where good classifier calibration, i.e. the ability to produce accurate probability estimates, is more…

Machine Learning · Computer Science 2020-05-26 Tuomo Alasalmi , Jaakko Suutala , Heli Koskimäki , Juha Röning

Batch mode active learning for efficient parameter estimation

For many tasks of data analysis, we may only have the information of the explanatory variable and the evaluation of the response values are quite expensive. While it is impractical or too costly to obtain the responses of all units, a…

Computation · Statistics 2023-04-07 Wei Zheng , Ting Tian , Xueqin Wang