Related papers: CacheDiff: Fast Random Sampling
In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models. FastDPM generalizes previous methods and gives rise to new algorithms with improved sample quality. We systematically investigate the…
Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…
Determinantal Point Processes (DPPs) are elegant probabilistic models of repulsion and diversity over discrete sets of items. But their applicability to large sets is hindered by expensive cubic-complexity matrix operations for basic tasks…
Caching is an approach to smoothen the variability of traffic over time. Recently it has been proved that the local memories at the users can be exploited for reducing the peak traffic in a much more efficient way than previously believed.…
Consistent sampling is a technique for specifying, in small space, a subset $S$ of a potentially large universe $U$ such that the elements in $S$ satisfy a suitably chosen sampling condition. Given a subset $\mathcal{I}\subseteq U$ it…
Binary segmentation is the classic greedy algorithm which recursively splits a sequential data set by optimizing some loss or likelihood function. Binary segmentation is widely used for changepoint detection in data sets measured over space…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…
We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams…
In this paper we combine the Alias method with the concept of systematic sampling, a method commonly used in particle filters for efficient low-variance resampling. The proposed method allows very fast sampling from a discrete distribution:…
We present a new approach for random sampling of contingency tables of any size and constraints based on a recently introduced $\textit{probabilistic divide-and-conquer}$ technique. A simple exact sampling algorithm is presented for…
We propose a method to exponentially speed up computation of various fingerprints, such as the ones used to compute similarity and rarity in massive data sets. Rather then maintaining the full stream of $b$ items of a universe $[u]$, such…
We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…
We present space efficient Monte Carlo algorithms that solve Subset Sum and Knapsack instances with $n$ items using $O^*(2^{0.86n})$ time and polynomial space, where the $O^*(\cdot)$ notation suppresses factors polynomial in the input size.…
This paper studies the sample complexity (aka number of comparisons) bounds for the active best-$k$ items selection from pairwise comparisons. From a given set of items, the learner can make pairwise comparisons on every pair of items, and…
This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…
The problem of estimating the number $n$ of distinct keys of a large collection of $N$ data is well known in computer science. A classical algorithm is the adaptive sampling (AS). $n$ can be estimated by $R.2^D$, where $R$ is the final…
We present a novel technique for constructing differentiable order-type operations, including soft ranking, soft top-k selection, and soft permutations. Our approach leverages an efficient closed-form formula for the inverse of the function…
We consider a variant of the online caching problem where the items exhibit dependencies among each other: an item can reside in the cache only if all its dependent items are also in the cache. The dependency relations can form any directed…
Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used…