Related papers: Consistent Subset Sampling

Consistent Sampling with Replacement

We describe a very simple method for `consistent sampling' that allows for sampling with replacement. The method extends previous approaches to consistent sampling, which assign a pseudorandom real number to each element, and sample those…

Data Structures and Algorithms · Computer Science 2018-08-31 Ronald L. Rivest

Mining Top-K Frequent Itemsets Through Progressive Sampling

We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes)…

Data Structures and Algorithms · Computer Science 2012-04-23 Andrea Pietracaprina , Matteo Riondato , Eli Upfal , Fabio Vandin

On the variance of subset sum estimation

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

Stream sampling for variance-optimal estimation of subset sums

From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir…

Data Structures and Algorithms · Computer Science 2010-11-16 Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup

Sequential Spatially Balanced Sampling

Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm…

Methodology · Statistics 2023-01-04 Raphaël Jauslin , Bardia Panahbehagh , Yves Tillé

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Effective Sampling: Fast Segmentation Using Robust Geometric Model Fitting

Identifying the underlying models in a set of data points contaminated by noise and outliers, leads to a highly complex multi-model fitting problem. This problem can be posed as a clustering problem by the projection of higher order…

Computer Vision and Pattern Recognition · Computer Science 2018-08-01 Ruwan Tennakoon , Alireza Sadri , Reza Hoseinnezhad , Alireza Bab-Hadiashar

Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that…

Data Structures and Algorithms · Computer Science 2018-10-31 Jiecao Chen , Qin Zhang

Nearly Consistent Finite Particle Estimates in Streaming Importance Sampling

In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC)…

Statistics Theory · Mathematics 2021-04-06 Alec Koppel , Amrit Singh Bedi , Brian M. Sadler , Victor Elvira

Scalable subsampling: computation, aggregation and inference

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic $\hat \theta _n$ in order to conduct nonparametric inference such as the construction of confidence intervals…

Statistics Theory · Mathematics 2021-12-14 Dimitris N. Politis

The Effectiveness of Uniform Sampling for Center-Based Clustering with Outliers

Clustering has many important applications in computer science, but real-world datasets often contain outliers. Moreover, the presence of outliers can make the clustering problems to be much more challenging. To reduce the complexities,…

Data Structures and Algorithms · Computer Science 2020-05-04 Hu Ding , Jiawei Huang , Haikuo Yu

Single-Step Consistent Diffusion Samplers

Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high…

Machine Learning · Computer Science 2025-02-17 Pascal Jutras-Dubé , Patrick Pynadath , Ruqi Zhang

Sampling in Space Restricted Settings

Space efficient algorithms play a central role in dealing with large amount of data. In such settings, one would like to analyse the large data using small amount of "working space". One of the key steps in many algorithms for analysing…

Data Structures and Algorithms · Computer Science 2015-01-19 Anup Bhattacharya , Davis Issac , Ragesh Jaiswal , Amit Kumar

Space Lower Bounds for Itemset Frequency Sketches

Given a database, computing the fraction of rows that contain a query itemset or determining whether this fraction is above some threshold are fundamental operations in data mining. A uniform sample of rows is a good sketch of the database…

Data Structures and Algorithms · Computer Science 2016-03-10 Edo Liberty , Michael Mitzenmacher , Justin Thaler , Jonathan Ullman

Efficient Sampling Policy for Selecting a Good Enough Subset

The note studies the problem of selecting a good enough subset out of a finite number of alternatives under a fixed simulation budget. Our work aims to maximize the posterior probability of correctly selecting a good subset. We formulate…

Optimization and Control · Mathematics 2023-05-09 Gongbo Zhang , Bin Chen , Qing-shan Jia , Yijie Peng

COMBSS: Best Subset Selection via Continuous Optimization

The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very…

Methodology · Statistics 2023-11-28 Sarat Moka , Benoit Liquet , Houying Zhu , Samuel Muller

The Sample Allocation Problem and Non-Uniform Compressive Sampling

This paper discusses sample allocation problem (SAP) in frequency-domain Compressive Sampling (CS) of time-domain signals. An analysis that is relied on two fundamental CS principles; the Uniform Random Sampling (URS) and the Uncertainty…

Information Theory · Computer Science 2014-12-22 Andriyan B. Suksmono

Consistent estimation of non-bandlimited spectral density from uniformly spaced samples

In the matter of selection of sample time points for the estimation of the power spectral density of a continuous time stationary stochastic process, irregular sampling schemes such as Poisson sampling are often preferred over regular…

Statistics Theory · Mathematics 2010-07-19 Radhendushka Srivastava , Debasis Sengupta

Convergence Of Consistency Model With Multistep Sampling Under General Data Assumptions

Diffusion models accomplish remarkable success in data generation tasks across various domains. However, the iterative sampling process is computationally expensive. Consistency models are proposed to learn consistency functions to map from…

Machine Learning · Computer Science 2025-05-07 Yiding Chen , Yiyi Zhang , Owen Oertell , Wen Sun

Structured sampling and fast reconstruction of smooth graph signals

This work concerns sampling of smooth signals on arbitrary graphs. We first study a structured sampling strategy for such smooth graph signals that consists of a random selection of few pre-defined groups of nodes. The number of groups to…

Social and Information Networks · Computer Science 2017-05-08 Gilles Puy , Patrick Pérez