Related papers: Stream Sampling with Immediate Decision

Sequential Unequal Probability Sampling For Stream Population

A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a…

Methodology · Statistics 2021-11-17 Bardia Panahbehagh , Raphaël Jauslin , Yves Tillé

Taking snapshots from a stream

This work is devoted to a certain class of probabilistic snapshots for elements of the observed data stream. We show you how one can control their probabilistic properties and we show some potential applications. Our solution can be used to…

Information Retrieval · Computer Science 2022-06-24 Dominik Bojko , Jacek Cichoń

Distinct Sampling on Streaming Data with Near-Duplicates

In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that…

Data Structures and Algorithms · Computer Science 2018-10-31 Jiecao Chen , Qin Zhang

Finding Favourite Tuples on Data Streams with Provably Few Comparisons

One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the…

Databases · Computer Science 2023-07-07 Guangyi Zhang , Nikolaj Tatti , Aristides Gionis

Parallel Streaming Random Sampling

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much…

Data Structures and Algorithms · Computer Science 2019-06-11 Kanat Tangwongsan , Srikanta Tirthapura

Streaming Algorithms from Precision Sampling

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called…

Data Structures and Algorithms · Computer Science 2011-04-26 Alexandr Andoni , Robert Krauthgamer , Krzysztof Onak

Estimating Coverage in Streams via a Modified CVM Method

When individuals in a population can be classified in classes or categories, the coverage of a sample, $C$, is defined as the probability that a randomly selected individual from the population belongs to a class represented in the sample.…

Computation · Statistics 2025-04-08 Carlos Hernandez-Suarez

StreamSampling.jl: Efficient Sampling from Data Streams in Julia

StreamSampling$.$jl is a Julia library designed to provide general and efficient methods for sampling from data streams in a single pass, even when the total number of items is unknown. In this paper, we describe the capabilities of the…

Software Engineering · Computer Science 2026-05-15 Adriano Meligrana

Fair and Representative Subset Selection from Data Streams

We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be…

Data Structures and Algorithms · Computer Science 2021-02-15 Yanhao Wang , Francesco Fabbri , Michael Mathioudakis

Weighted Reservoir Sampling With Replacement from Data Streams

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

Subset Sampling over Joins

Subset sampling (also known as Poisson sampling), where the decision to include any specific element in the sample is made independently of all others, is a fundamental primitive in data analytics, enabling efficient approximation by…

Databases · Computer Science 2025-12-19 Aryan Esmailpour , Xiao Hu , Jinchao Huang , Stavros Sintos

Sequential Spatially Balanced Sampling

Sequential sampling occurs when the entire population is not known in advance and data are obtained one at a time or in groups of units. This manuscript proposes a new algorithm to sequentially select a balanced sample. The algorithm…

Methodology · Statistics 2023-01-04 Raphaël Jauslin , Bardia Panahbehagh , Yves Tillé

Capturing Data Uncertainty in High-Volume Stream Processing

We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous random…

Databases · Computer Science 2009-09-15 Yanlei Diao , Boduo Li , Anna Liu , Liping Peng , Charles Sutton , Thanh Tran , Michael Zink

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Space-Efficient Sampling from Social Activity Streams

In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large…

Social and Information Networks · Computer Science 2012-06-22 Nesreen K. Ahmed , Jennifer Neville , Ramana Kompella

A Decision Method for Elementary Stream Calculus

The main result is a doubly exponential decision procedure for the first-order equality theory of streams with both arithmetic and control-oriented stream operations. This stream logic is expressive for elementary problems of stream…

Logic in Computer Science · Computer Science 2024-01-05 Harald Ruess

RPS: A Generic Reservoir Patterns Sampler

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

Ordering sampling rules for sequential anomaly identification under sampling constraints

We consider the problem of sequential anomaly identification over multiple independent data streams, under the presence of a sampling constraint. The goal is to quickly identify those that exhibit anomalous statistical behavior, when it is…

Statistics Theory · Mathematics 2025-12-23 Aristomenis Tsopelakos , Georgios Fellouris

Conditioning Normalizing Flows for Rare Event Sampling

Understanding the dynamics of complex molecular processes is often linked to the study of infrequent transitions between long-lived stable states. The standard approach to the sampling of such rare events is to generate an ensemble of…

Computational Physics · Physics 2023-05-22 Sebastian Falkner , Alessandro Coretti , Salvatore Romano , Phillip Geissler , Christoph Dellago

Reservoir Sampling over Joins

Sampling over joins is a fundamental task in large-scale data analytics. Instead of computing the full join results, which could be massive, a uniform sample of the join results would suffice for many purposes, such as answering analytical…

Databases · Computer Science 2024-04-11 Binyang Dai , Xiao Hu , Ke Yi