Related papers: CacheDiff: Fast Random Sampling

On Fast Sampling of Diffusion Probabilistic Models

In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models. FastDPM generalizes previous methods and gives rise to new algorithms with improved sample quality. We systematically investigate the…

Machine Learning · Computer Science 2021-06-25 Zhifeng Kong , Wei Ping

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Efficient Sampling for k-Determinantal Point Processes

Determinantal Point Processes (DPPs) are elegant probabilistic models of repulsion and diversity over discrete sets of items. But their applicability to large sets is hindered by expensive cubic-complexity matrix operations for basic tasks…

Machine Learning · Computer Science 2016-05-31 Chengtao Li , Stefanie Jegelka , Suvrit Sra

$K$ Users Caching Two Files: An Improved Achievable Rate

Caching is an approach to smoothen the variability of traffic over time. Recently it has been proved that the local memories at the users can be exploited for reducing the peak traffic in a much more efficient way than previously believed.…

Information Theory · Computer Science 2015-12-22 Saeid Sahraei , Michael Gastpar

Consistent Subset Sampling

Consistent sampling is a technique for specifying, in small space, a subset $S$ of a potentially large universe $U$ such that the elements in $S$ satisfy a suitably chosen sampling condition. Given a subset $\mathcal{I}\subseteq U$ it…

Data Structures and Algorithms · Computer Science 2014-04-21 Konstantin Kutzkov , Rasmus Pagh

Finite Sample Complexity Analysis of Binary Segmentation

Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood function. Binary segmentation is widely used for changepoint detection in data sets measured over space…

Machine Learning · Computer Science 2024-10-14 Toby Dylan Hocking

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Feasible Sampling of Non-strict Turnstile Data Streams

We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams…

Data Structures and Algorithms · Computer Science 2012-09-26 Neta Barkay , Ely Porat , Bar Shalem

Systematic Alias Sampling: an efficient and low-variance way to sample from a discrete distribution

In this paper we combine the Alias method with the concept of systematic sampling, a method commonly used in particle filters for efficient low-variance resampling. The proposed method allows very fast sampling from a discrete distribution:…

Data Structures and Algorithms · Computer Science 2025-09-30 Ilari Vallivaara , Katja Poikselkä , Pauli Rikula , Juha Röning

Random Sampling of Contingency Tables via Probabilistic Divide-and-Conquer

We present a new approach for random sampling of contingency tables of any size and constraints based on a recently introduced $\textit{probabilistic divide-and-conquer}$ technique. A simple exact sampling algorithm is presented for…

Statistics Theory · Mathematics 2016-03-01 Stephen DeSalvo , James Y. Zhao

Fast Pseudo-Random Fingerprints

We propose a method to exponentially speed up computation of various fingerprints, such as the ones used to compute similarity and rarity in massive data sets. Rather then maintaining the full stream of $b$ items of a universe $[u]$, such…

Data Structures and Algorithms · Computer Science 2010-09-30 Yoram Bachrach , Ely Porat

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…

Data Structures and Algorithms · Computer Science 2021-02-15 Yanhao Wang , Francesco Fabbri , Michael Mathioudakis

Faster Space-Efficient Algorithms for Subset Sum, k-Sum and Related Problems

We present space efficient Monte Carlo algorithms that solve Subset Sum and Knapsack instances with $n$ items using $O^*(2^{0.86n})$ time and polynomial space, where the $O^*(\cdot)$ notation suppresses factors polynomial in the input size.…

Data Structures and Algorithms · Computer Science 2017-06-27 Nikhil Bansal , Shashwat Garg , Jesper Nederlof , Nikhil Vyas

The Sample Complexity of Best-$k$ Items Selection from Pairwise Comparisons

This paper studies the sample complexity (aka number of comparisons) bounds for the active best-$k$ items selection from pairwise comparisons. From a given set of items, the learner can make pairwise comparisons on every pair of items, and…

Machine Learning · Computer Science 2021-08-02 Wenbo Ren , Jia Liu , Ness B. Shroff

An asymptotically optimal, online algorithm for weighted random sampling with replacement

This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…

Data Structures and Algorithms · Computer Science 2016-11-03 Michał Startek

The Adaptive Sampling Revisited

The problem of estimating the number $n$ of distinct keys of a large collection of $N$ data is well known in computer science. A classical algorithm is the adaptive sampling (AS). $n$ can be estimated by $R.2^D$, where $R$ is the final…

Data Structures and Algorithms · Computer Science 2019-05-17 Matthew Drescher , Guy Louchard , Yvik Swan

LapSum -- One Method to Differentiate Them All: Ranking, Sorting and Top-k Selection

We present a novel technique for constructing differentiable order-type operations, including soft ranking, soft top-k selection, and soft permutations. Our approach leverages an efficient closed-form formula for the inverse of the function…

Artificial Intelligence · Computer Science 2025-09-04 Łukasz Struski , Michał B. Bednarczyk , Igor T. Podolak , Jacek Tabor

Dependency-Aware Online Caching

We consider a variant of the online caching problem where the items exhibit dependencies among each other: an item can reside in the cache only if all its dependent items are also in the cache. The dependency relations can form any directed…

Data Structures and Algorithms · Computer Science 2024-01-31 Julien Dallot , Amirmehdi Jafari Fesharaki , Maciej Pacut , Stefan Schmid

Enhancing Sampling-based Planning with a Library of Paths

Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used…

Robotics · Computer Science 2026-01-09 Michal Minařík , Vojtěch Vonásek , Robert Pěnička