Related papers: Data-Pooling in Stochastic Optimization

Mostly Beneficial Clustering: Aggregating Data for Operational Decision Making

With increasingly volatile market conditions and rapid product innovations, operational decision-making for large-scale systems entails solving thousands of problems with limited data. Data aggregation is proposed to combine the data across…

Machine Learning · Computer Science 2023-12-19 Chengzhang Li , Zhenkang Peng , Ying Rong

Stochastic Subsampling With Average Pooling

Regularization of deep neural networks has been an important issue to achieve higher generalization performance without overfitting problems. Although the popular method of Dropout provides a regularization effect, it causes inconsistent…

Machine Learning · Computer Science 2024-09-26 Bum Jun Kim , Sang Woo Kim

On a Near-Optimal \& Efficient Algorithm for the Sparse Pooled Data Problem

The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$.…

Probability · Mathematics 2023-12-25 Max Hahn-Klimroth , Remco van der Hofstad , Noela Müller , Connor Riddlesden

Stochastic Pooling Networks

We introduce and define the concept of a stochastic pooling network (SPN), as a model for sensor systems where redundancy and two forms of 'noise' -- lossy compression and randomness -- interact in surprising ways. Our approach to analyzing…

Statistical Mechanics · Physics 2009-01-26 Mark D. McDonnell , Pierre-Olivier Amblard , Nigel G. Stocks

Data-pooling Reinforcement Learning for Personalized Healthcare Intervention

Motivated by the emerging needs of personalized preventative intervention in many healthcare applications, we consider a multi-stage, dynamic decision-making problem in the online setting with unknown model parameters. To deal with the…

Machine Learning · Computer Science 2022-11-17 Xinyun Chen , Pengyi Shi , Shanwen Pu

Synthesizing Evidence: Data-Pooling as a Tool for Treatment Selection in Online Experiments

Randomized experiments are the gold standard for causal inference but face significant challenges in business applications, including limited traffic allocation, the need for heterogeneous treatment effect estimation, and the complexity of…

Methodology · Statistics 2025-08-18 Zhenkang Peng , Chengzhang Li , Ying Rong , Renyu Zhang

Near optimal efficient decoding from pooled data

Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of…

Information Theory · Computer Science 2022-02-10 Max Hahn-Klimroth , Noela Müller

Bias Reduction in Sample-Based Optimization

We consider stochastic optimization problems which use observed data to estimate essential characteristics of the random quantities involved. Sample average approximation (SAA) or empirical (plug-in) estimation are very popular ways to use…

Statistics Theory · Mathematics 2021-03-16 Darinka Dentcheva , Yang Lin

Phase Transitions in the Pooled Data Problem

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we…

Machine Learning · Statistics 2017-10-19 Jonathan Scarlett , Volkan Cevher

Dataset Distillation as Pushforward Optimal Quantization

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be…

Machine Learning · Computer Science 2026-02-09 Hong Ye Tan , Emma Slade

Improve Cross-Architecture Generalization on Dataset Distillation

Dataset distillation, a pragmatic approach in machine learning, aims to create a smaller synthetic dataset from a larger existing dataset. However, existing distillation methods primarily adopt a model-based paradigm, where the synthetic…

Machine Learning · Computer Science 2024-02-21 Binglin Zhou , Linhao Zhong , Wentao Chen

Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there…

Machine Learning · Computer Science 2022-03-30 Vishnu Suresh Lokhande , Rudrasis Chakraborty , Sathya N. Ravi , Vikas Singh

Data Privacy and Specimen Pooling: Using an old tool for New Challenges

Background: In the context of ongoing debate over data confidentiality versus shared use of research data, as raised following the new EU General Data Protection Regulation, we seek to find alternate techniques that can balance these two…

Applications · Statistics 2016-06-20 Paramita Saha-Chaudhuri , Clarice Weinberg

Should data ever be thrown away? Pooling interval-censored data sets with different precision

Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data…

Methodology · Statistics 2023-03-02 Krasymyr Tretiak , Scott Ferson

Optimal split of orders across liquidity pools: a stochastic algorithm approach

Evolutions of the trading landscape lead to the capability to exchange the same financial instrument on different venues. Because of liquidity issues, the trading firms split large orders across several trading destinations to optimize…

Trading and Market Microstructure · Quantitative Finance 2010-07-28 Sophie Laruelle , Charles-Albert Lehalle , Gilles Pagès

Embarassingly Simple Dataset Distillation

Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its…

Machine Learning · Computer Science 2023-11-14 Yunzhen Feng , Ramakrishna Vedantam , Julia Kempe

Distributed stochastic optimization via correlated scheduling

This paper considers a problem where multiple users make repeated decisions based on their own observed events. The events and decisions at each time step determine the values of a utility function and a collection of penalty functions. The…

Optimization and Control · Mathematics 2013-05-13 Michael J. Neely

Feeding the multitude: A polynomial-time algorithm to improve sampling

A wide variety of optimization techniques, both exact and heuristic, tend to be biased samplers. This means that when attempting to find multiple uncorrelated solutions of a degenerate Boolean optimization problem a subset of the solution…

Disordered Systems and Neural Networks · Physics 2019-05-14 Andrew J. Ochoa , Darryl C. Jacob , Salvatore Mandrà , Helmut G. Katzgraber

Pooling Image Datasets With Multiple Covariate Shift and Imbalance

Small sample sizes are common in many disciplines, which necessitates pooling roughly similar datasets across multiple institutions to study weak but relevant associations between images and disease outcomes. Such data often manifest…

Machine Learning · Computer Science 2024-11-19 Sotirios Panagiotis Chytas , Vishnu Suresh Lokhande , Peiran Li , Vikas Singh

Improving optimal subsampling through stratification

Recent works have proposed optimal subsampling algorithms to improve computational efficiency in large datasets and to design validation studies in the presence of measurement error. Existing approaches generally fall into two categories:…

Methodology · Statistics 2025-12-25 Jasper B. Yang , Thomas Lumley , Bryan E. Shepherd , Pamela A. Shaw