Related papers: A Framework for Efficient Model Evaluation through…

Label-Efficient Monitoring of Classification Models via Stratified Importance Sampling

Monitoring the performance of classification models in production is critical yet challenging due to strict labeling budgets, one-shot batch acquisition of labels and extremely low error rates. We propose a general framework based on…

Machine Learning · Computer Science 2026-02-02 Lupo Marsigli , Angel Lopez de Haro

Stratified Sampling for Model-Assisted Estimation with Surrogate Outcomes

In many randomized trials, outcomes such as essays or open-ended responses must be manually scored as a preliminary step to impact analysis, a process that is costly and limiting. Model-assisted estimation offers a way to combine surrogate…

Methodology · Statistics 2026-02-16 Reagan Mozer , Nicole E. Pashley , Luke Miratrix

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

Effective Sampling: Fast Segmentation Using Robust Geometric Model Fitting

Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the projection of higher order…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Ruwan Tennakoon , Alireza Sadri , Reza Hoseinnezhad , Alireza Bab-Hadiashar

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

We study the data selection problem, whose aim is to select a small representative subset of data that can be used to efficiently train a machine learning model. We present a new data selection approach based on $k$-means clustering and…

Machine Learning · Computer Science 2024-02-28 Kyriakos Axiotis , Vincent Cohen-Addad , Monika Henzinger , Sammy Jerome , Vahab Mirrokni , David Saulpic , David Woodruff , Michael Wunder

Bridging Stratification and Regression Adjustment: Batch-Adaptive Stratification with Post-Design Adjustment in Randomized Experiments

To increase statistical efficiency in a randomized experiment, researchers often use stratification (i.e., blocking) in the design stage. However, conventional practices of stratification fail to exploit valuable information about the…

Methodology · Statistics 2025-10-28 Zikai Li

Learning to Sample: Counting with Complex Queries

We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may…

Databases · Computer Science 2020-01-01 Brett Walenz , Stavros Sintos , Sudeepa Roy , Jun Yang

Modeling with Categorical Features via Exact Fusion and Sparsity Regularisation

We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a)…

Methodology · Statistics 2026-03-30 Kayhan Behdin , Riade Benbaki , Peter Radchenko , Rahul Mazumder

On adaptive stratification

This paper investigates the use of stratified sampling as a variance reduction technique for approximating integrals over large dimensional spaces. The accuracy of this method critically depends on the choice of the space partition, the…

Probability · Mathematics 2009-09-15 Pierre Etoré , Gersende Fort , Benjamin Jourdain , Eric Moulines

Stratified Random Sampling for Dependent Inputs

A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…

Methodology · Statistics 2019-11-25 Anirban Mondal , Abhijit Mandal

Reducing Estimation Uncertainty Using Normalizing Flows and Stratification

Estimating the expectation of a real-valued function of a random variable from sample data is a critical aspect of statistical analysis, with far-reaching implications in various applications. Current methodologies typically assume…

Machine Learning · Computer Science 2026-02-18 Paweł Lorek , Rafał Nowak , Rafał Topolnicki , Tomasz Trzciński , Maciej Zięba , Aleksandra Krystecka

Simulation Model Calibration with Dynamic Stratification and Adaptive Sampling

Calibrating simulation models that take large quantities of multi-dimensional data as input is a hard simulation optimization problem. Existing adaptive sampling strategies offer a methodological solution. However, they may not sufficiently…

Methodology · Statistics 2024-07-17 Pranav Jain , Sara Shashaani , Eunshin Byon

Toward More Effective Human Evaluation for Machine Translation

Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal. We investigate a simple way to reduce cost by reducing the…

Computation and Language · Computer Science 2022-04-12 Belén Saldías , George Foster , Markus Freitag , Qijun Tan

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Revisiting Score Function Estimators for $k$-Subset Sampling

Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding…

Machine Learning · Computer Science 2024-08-19 Klas Wijk , Ricardo Vinuesa , Hossein Azizpour

Improving optimal subsampling through stratification

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally…

Methodology · Statistics 2017-11-02 Chandler Zuo

A unified framework for covariate adjustment under stratified randomization

Randomization, as a key technique in clinical trials, can eliminate sources of bias and produce comparable treatment groups. In randomized experiments, the treatment effect is a parameter of general interest. Researchers have explored the…

Methodology · Statistics 2023-12-05 Fuyi Tu , Wei Ma , Hanzhong Liu

StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics

In machine learning (ML), ensemble methods such as bagging, boosting, and stacking are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble…

Machine Learning · Computer Science 2024-04-19 Angelos Chatzimparmpas , Rafael M. Martins , Kostiantyn Kucher , Andreas Kerren

Fitting Multiple Machine Learning Models with Performance Based Clustering

Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We…

Machine Learning · Computer Science 2025-01-31 Mehmet Efe Lorasdagi , Ahmet Berker Koc , Ali Taha Koc , Suleyman Serdar Kozat