Related papers: Stream Aggregation Through Order Sampling

Weighted Reservoir Sampling With Replacement from Data Streams

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

RPS: A Generic Reservoir Patterns Sampler

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

Communication-Efficient (Weighted) Reservoir Sampling from Fully Distributed Data Streams

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

Efficient sorting, duplicate removal, grouping, and aggregation

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…

Databases · Computer Science 2022-09-27 Thanh Do , Goetz Graefe , Jeffrey Naughton

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

In-Order Sliding-Window Aggregation in Worst-Case Constant Time

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily…

Databases · Computer Science 2020-09-30 Kanat Tangwongsan , Martin Hirzel , Scott Schneider

Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling

Online learning methods, like the seminal Passive-Aggressive (PA) classifier, are still highly effective for high-dimensional streaming data, out-of-core processing, and other throughput-sensitive applications. Many such algorithms rely on…

Machine Learning · Computer Science 2024-11-01 Skyler Wu , Fred Lu , Edward Raff , James Holt

Prioritized Restreaming Algorithms for Balanced Graph Partitioning

Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared…

Social and Information Networks · Computer Science 2020-07-08 Amel Awadelkarim , Johan Ugander

Pattern Recognition and Event Detection on IoT Data-streams

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

On Sampling from Massive Graph Streams

We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive streams of graph edges. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to…

Social and Information Networks · Computer Science 2017-03-09 Nesreen K. Ahmed , Nick Duffield , Theodore Willke , Ryan A. Rossi

Estimating Aggregate Properties on Probabilistic Streams

The probabilistic-stream model was introduced by Jayram et al. \cite{JKV07}. It is a generalization of the data stream model that is suited to handling ``probabilistic'' data where each item of the stream represents a probability…

Data Structures and Algorithms · Computer Science 2007-05-23 Andrew McGregor , S. Muthukrishnan

Rank aggregation for non-stationary data streams

We consider the problem of learning over non-stationary ranking streams. The rankings can be interpreted as the preferences of a population and the non-stationarity means that the distribution of preferences changes over time. Our goal is…

Machine Learning · Statistics 2020-10-28 Ekhine Irurozki , Jesus Lobo , Aritz Perez , Javier Del Ser

Filter Distillation for Network Compression

In this paper we introduce Principal Filter Analysis (PFA), an easy to use and effective method for neural network compression. PFA exploits the correlation between filter responses within network layers to recommend a smaller network that…

Computer Vision and Pattern Recognition · Computer Science 2019-12-12 Xavier Suau , Luca Zappella , Nicholas Apostoloff

Diba: A Re-configurable Stream Processor

Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and…

Databases · Computer Science 2024-08-29 Mohammadreza Najafi , Thamir M. Qadah , Mohammad Sadoghi , Hans-Arno Jacobsen

Sequential Unequal Probability Sampling For Stream Population

A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a…

Methodology · Statistics 2021-11-17 Bardia Panahbehagh , Raphaël Jauslin , Yves Tillé

Can we aggregate human intelligence? an approach for human centric aggregation using ordered weighted averaging operators

The primary objective of this paper is to present an approach for recommender systems that can assimilate ranking to the voters or rankers so that recommendation can be made by giving priority to experts suggestion over usual…

Information Retrieval · Computer Science 2021-05-04 Shahab Saquib Sohail , Jamshed Siddiqui , Rashid Ali , S. Hamid Hasan , M. Afshar Alam

BOBA: A Parallel Lightweight Graph Reordering Algorithm with Heavyweight Implications

We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our ``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-22 Matthew Drescher , Muhammad A. Awad , Serban D. Porumbescu , John D. Owens

Active Weighted Aging Ensemble for Drifted Data Stream Classification

One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the…

Machine Learning · Computer Science 2021-12-21 Michał Woźniak , Paweł Zyblewski , Paweł Ksieniewicz

[Experiments \& Analysis] Hash-Based vs. Sort-Based Group-By-Aggregate: A Focused Empirical Study [Extended Version]

Group-by-aggregate (GBA) queries are integral to data analysis, allowing users to group data by specific attributes and apply aggregate functions such as sum, average, and count. Database Management Systems (DBMSs) typically execute GBA…

Databases · Computer Science 2024-12-03 Gaurav Vaghasiya , Shiva Jahangiri

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup