Related papers: Consistent Sampling with Replacement
Consistent sampling is a technique for specifying, in small space, a subset $S$ of a potentially large universe $U$ such that the elements in $S$ satisfy a suitably chosen sampling condition. Given a subset $\mathcal{I}\subseteq U$ it…
Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency,…
Sampling is a fundamental technique, and sampling without replacement is often desirable when duplicate samples are not beneficial. Within machine learning, sampling is useful for generating diverse outputs from a trained model. We present…
Information retrieval systems are usually measured by labeling the relevance of results corresponding to a sample of user queries. In practical search engines, such measurement needs to be performed continuously, such as daily or weekly.…
Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm…
Sampling techniques are used in many fields, including design of experiments, image processing, and graphics. The techniques in each field are designed to meet the constraints specific to that field such as uniform coverage of the range of…
In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…
Subsampling methods aim to select a subsample as a surrogate for the observed sample. As a powerful technique for large-scale data analysis, various subsampling methods are developed for more effective coefficient estimation and model…
We introduce dynamic nested sampling: a generalisation of the nested sampling algorithm in which the number of "live points" varies to allocate samples more efficiently. In empirical tests the new method significantly improves calculation…
Consider the fundamental problem of drawing a simple random sample of size k without replacement from [n] := {1, . . . , n}. Although a number of classical algorithms exist for this problem, we construct algorithms that are even simpler,…
A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a…
We revisit the classical result in finite population sampling which states that in equally-likely "simple" random sampling the sample mean is more reliable when we do not replace after each draw. In this paper we investigate if and when the…
Recent stochastic gradient methods that have appeared in the literature base their efficiency and global convergence properties on a suitable control of the variance of the gradient batch estimate. This control is typically achieved by…
Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we…
This paper addresses the problem of estimating the containment and similarity between two sets using only random samples from each set, without relying on sketches of full sets. The study introduces a binomial model for predicting the…
Compressive sampling has become a widely used approach to construct polynomial chaos surrogates when the number of available simulation samples is limited. Originally, these expensive simulation samples would be obtained at random locations…
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency - poll the LLM multiple times and output the most frequent solution. Existing Self-Consistency techniques always generate a…
Suppose one desires to randomly sample a pair of objects such as socks, hoping to get a matching pair. Even in the simplest situation for sampling, which is sampling with replacement, the innocent phrase "the distribution of the color of a…
Sequential Monte Carlo (SMC) samplers are powerful tools for Bayesian inference but suffer from high computational costs due to their reliance on large particle ensembles for accurate estimates. We introduce persistent sampling (PS), an…
This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…