Related papers: Weighted Random Sampling over Data Streams

Weighted Reservoir Sampling With Replacement from Data Streams

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph Counting on Fully Dynamic Graph Streams

As the popularity of graph data increases, there is a growing need to count the occurrences of subgraph patterns of interest, for a variety of applications. Many graphs are massive in scale and also fully dynamic (with insertions and…

Databases · Computer Science 2022-11-15 Kaixin Wang , Cheng Long , Da Yan , Jie Zhang , H. V. Jagadish

RPS: A Generic Reservoir Patterns Sampler

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Communication-Efficient (Weighted) Reservoir Sampling from Fully Distributed Data Streams

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

An asymptotically optimal, online algorithm for weighted random sampling with replacement

This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…

Data Structures and Algorithms · Computer Science 2016-11-03 Michał Startek

Weighted Random Sampling over Joins

Joining records with all other records that meet a linkage condition can result in an astronomically large number of combinations due to many-to-many relationships. For such challenging (acyclic) joins, a random sample over the join result…

Databases · Computer Science 2022-01-11 Michael Shekelyan , Graham Cormode , Peter Triantafillou , Ali Shanghooshabad , Qingzhi Ma

Weighted Sampling Without Replacement from Data Streams

Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm…

Data Structures and Algorithms · Computer Science 2015-06-08 Vladimir Braverman , Rafail Ostrovsky , Gregory Vorsanger

On the variance of subset sum estimation

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

Optimal Representative Sample Weighting

We consider the problem of assigning weights to a set of samples or data records, with the goal of achieving a representative weighting, which happens when certain sample averages of the data are close to prescribed values. We frame the…

Machine Learning · Statistics 2020-05-20 Shane Barratt , Guillermo Angeris , Stephen Boyd

Weighted Random Search for CNN Hyperparameter Optimization

Nearly all model algorithms used in machine learning use two different sets of parameters: the training parameters and the meta-parameters (hyperparameters). While the training parameters are learned during the training phase, the values of…

Machine Learning · Computer Science 2020-03-31 Razvan Andonie , Adrian-Catalin Florea

Pattern Recognition and Event Detection on IoT Data-streams

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

Coordinated Weighted Sampling for Estimating Aggregates Over Multiple Weight Assignments

Many data sources are naturally modeled by multiple weight assignments over a set of keys: snapshots of an evolving database at multiple points in time, measurements collected over multiple time periods, requests for resources served at…

Databases · Computer Science 2010-11-11 Edith Cohen , Haim Kaplan , Subhabrata Sen

Importance Weighted Transfer of Samples in Reinforcement Learning

We consider the transfer of experience samples (i.e., tuples < s, a, s', r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus…

Machine Learning · Computer Science 2018-05-29 Andrea Tirinzoni , Andrea Sessa , Matteo Pirotta , Marcello Restelli

Unified Rules of Renewable Weighted Sums for Various Online Updating Estimations

This paper establishes unified frameworks of renewable weighted sums (RWS) for various online updating estimations in the models with streaming data sets. The newly defined RWS lays the foundation of online updating likelihood, online…

Methodology · Statistics 2020-08-21 Lu Lin , Weiyu Li , Jun Lu

Analyzing cellwise weighted data

Often the rows (cases, objects) of a dataset have weights. For instance, the weight of a case may reflect the number of times it has been observed, or its reliability. For analyzing such data many rowwise weighted techniques are available,…

Computation · Statistics 2024-07-08 Peter J. Rousseeuw

Robust Estimation for Multivariate Wrapped Models

A weighted likelihood technique for robust estimation of a multivariate Wrapped Normal distribution for data points scattered on a p-dimensional torus is proposed. The occurrence of outliers in the sample at hand can badly compromise…

Methodology · Statistics 2021-07-01 Giovanni Saraceno , Claudio Agostinelli , Luca Greco

A Weighted Likelihood Approach Based on Statistical Data Depths

We propose a general approach to construct weighted likelihood estimating equations with the aim of obtain robust estimates. The weight, attached to each score contribution, is evaluated by comparing the statistical data depth at the model…

Methodology · Statistics 2018-02-16 Claudio Agostinelli

Stream sampling for variance-optimal estimation of subset sums

From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir…

Data Structures and Algorithms · Computer Science 2010-11-16 Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup