Related papers: Instance-Optimality in I/O-Efficient Sampling and …

Stochastic optimization with arbitrary recurrent data sampling

For obtaining optimal first-order convergence guarantee for stochastic optimization, it is necessary to use a recurrent data sampling algorithm that samples every data point with sufficient frequency. Most commonly used data sampling…

Optimization and Control · Mathematics 2024-07-23 William G. Powell , Hanbaek Lyu

Batch mode active learning for efficient parameter estimation

For many tasks of data analysis, we may only have the information of the explanatory variable and the evaluation of the response values are quite expensive. While it is impractical or too costly to obtain the responses of all units, a…

Computation · Statistics 2023-04-07 Wei Zheng , Ting Tian , Xueqin Wang

Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information

Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the…

Databases · Computer Science 2015-03-19 Edith Cohen , Haim Kaplan

Instance Dependent Testing of Samplers using Interval Conditioning

Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably correct testers like Barbarik, Teq, Flash, CubeProbe…

Data Structures and Algorithms · Computer Science 2025-12-09 Rishiraj Bhattacharyya , Sourav Chakraborty , Yash Pote , Uddalok Sarkar , Sayantan Sen

On the Optimality of Averaging in Distributed Statistical Learning

A common approach to statistical learning with big-data is to randomly split it among $m$ machines and learn the parameter of interest by averaging the $m$ individual estimates. In this paper, focusing on empirical risk minimization, or…

Machine Learning · Statistics 2016-06-14 Jonathan Rosenblatt , Boaz Nadler

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating…

Machine Learning · Computer Science 2024-07-01 Vitaly Feldman , Audra McMillan , Satchit Sivakumar , Kunal Talwar

Subset Sampling and Its Extensions

This paper studies the \emph{subset sampling} problem. The input is a set $\mathcal{S}$ of $n$ records together with a function $\textbf{p}$ that assigns each record $v\in\mathcal{S}$ a probability $\textbf{p}(v)$. A query returns a random…

Data Structures and Algorithms · Computer Science 2023-07-24 Jinchao Huang , Sibo Wang

Efficiently Extracting Randomness from Imperfect Stochastic Processes

We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically…

Information Theory · Computer Science 2012-09-05 Hongchao Zhou , Jehoshua Bruck

Stochastic Learning under Random Reshuffling with Constant Step-sizes

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works…

Machine Learning · Computer Science 2019-01-30 Bicheng Ying , Kun Yuan , Stefan Vlaski , Ali H. Sayed

Optimal quantum sampling on distributed databases

Quantum sampling, a fundamental subroutine in numerous quantum algorithms, involves encoding a given probability distribution in the amplitudes of a pure state. Given the hefty cost of large-scale quantum storage, we initiate the study of…

Quantum Physics · Physics 2025-06-10 Longyun Chen , Jingcheng Liu , Penghui Yao

An Instance-optimal Algorithm for Bichromatic Rectangular Visibility

Afshani, Barbay and Chan (2017) introduced the notion of instance-optimal algorithm in the order-oblivious setting. An algorithm A is instance-optimal in the order-oblivious setting for a certain class of algorithms A* if the following…

Computational Geometry · Computer Science 2023-07-28 Jean Cardinal , Justin Dallant , John Iacono

Subsampling Algorithms for Semidefinite Programming

We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls…

Optimization and Control · Mathematics 2011-08-30 Alexandre d'Aspremont

Efficient Random Sampling -- Parallel, Vectorized, Cache-Efficient, and Online

We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…

Data Structures and Algorithms · Computer Science 2019-11-18 Peter Sanders , Sebastian Lamm , Lorenz Hübschle-Schneider , Emanuel Schrade , Carsten Dachsbacher

A Sequential Deep Learning Algorithm for Sampled Mixed-integer Optimisation Problems

Mixed-integer optimisation problems can be computationally challenging. Here, we introduce and analyse two efficient algorithms with a specific sequential design that are aimed at dealing with sampled problems within this class. At each…

Optimization and Control · Mathematics 2023-03-07 Mohammadreza Chamanbaz , Roland Bouffanais

Near Optimal Inference for the Best-Performing Algorithm

Consider a collection of competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to rank highest on a…

Machine Learning · Computer Science 2025-08-08 Amichai Painsky

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Estimation from Partially Sampled Distributed Traces

Sampling is often a necessary evil to reduce the processing and storage costs of distributed tracing. In this work, we describe a scalable and adaptive sampling approach that can preserve events of interest better than the widely used…

Data Structures and Algorithms · Computer Science 2021-07-19 Otmar Ertl

Parallel optimized sampling for stochastic equations

Stochastic equations play an important role in computational science, due to their ability to treat a wide variety of complex statistical problems. However, current algorithms are strongly limited by their sampling variance, which scales…

Numerical Analysis · Mathematics 2017-01-04 Bogdan Opanchuk , Simon Kiesewetter , Peter D. Drummond

Selection of the Most Probable Best

We consider an expected-value ranking and selection (R&S) problem where all k solutions' simulation outputs depend on a common parameter whose uncertainty can be modeled by a distribution. We define the most probable best (MPB) to be the…

Methodology · Statistics 2024-04-23 Taeho Kim , Kyoung-kuk Kim , Eunhye Song

Instance Optimal Learning

We consider the following basic learning task: given independent draws from an unknown distribution over a discrete support, output an approximation of the distribution that is as accurate as possible in $\ell_1$ distance (i.e. total…

Machine Learning · Computer Science 2015-11-12 Gregory Valiant , Paul Valiant