English
Related papers

Related papers: Stream Aggregation Through Order Sampling

200 papers

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external…

Databases · Computer Science 2022-09-27 Thanh Do , Goetz Graefe , Jeffrey Naughton

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

Sliding-window aggregation is a widely-used approach for extracting insights from the most recent portion of a data stream. The aggregations of interest can usually be expressed as binary operators that are associative but not necessarily…

Databases · Computer Science 2020-09-30 Kanat Tangwongsan , Martin Hirzel , Scott Schneider

Online learning methods, like the seminal Passive-Aggressive (PA) classifier, are still highly effective for high-dimensional streaming data, out-of-core processing, and other throughput-sensitive applications. Many such algorithms rely on…

Machine Learning · Computer Science 2024-11-01 Skyler Wu , Fred Lu , Edward Raff , James Holt

Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared…

Social and Information Networks · Computer Science 2020-07-08 Amel Awadelkarim , Johan Ugander

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive streams of graph edges. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to…

Social and Information Networks · Computer Science 2017-03-09 Nesreen K. Ahmed , Nick Duffield , Theodore Willke , Ryan A. Rossi

The probabilistic-stream model was introduced by Jayram et al. \cite{JKV07}. It is a generalization of the data stream model that is suited to handling ``probabilistic'' data where each item of the stream represents a probability…

Data Structures and Algorithms · Computer Science 2007-05-23 Andrew McGregor , S. Muthukrishnan

We consider the problem of learning over non-stationary ranking streams. The rankings can be interpreted as the preferences of a population and the non-stationarity means that the distribution of preferences changes over time. Our goal is…

Machine Learning · Statistics 2020-10-28 Ekhine Irurozki , Jesus Lobo , Aritz Perez , Javier Del Ser

In this paper we introduce Principal Filter Analysis (PFA), an easy to use and effective method for neural network compression. PFA exploits the correlation between filter responses within network layers to recommend a smaller network that…

Computer Vision and Pattern Recognition · Computer Science 2019-12-12 Xavier Suau , Luca Zappella , Nicholas Apostoloff

Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and…

Databases · Computer Science 2024-08-29 Mohammadreza Najafi , Thamir M. Qadah , Mohammad Sadoghi , Hans-Arno Jacobsen

A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a…

Methodology · Statistics 2021-11-17 Bardia Panahbehagh , Raphaël Jauslin , Yves Tillé

The primary objective of this paper is to present an approach for recommender systems that can assimilate ranking to the voters or rankers so that recommendation can be made by giving priority to experts suggestion over usual…

Information Retrieval · Computer Science 2021-05-04 Shahab Saquib Sohail , Jamshed Siddiqui , Rashid Ali , S. Hamid Hasan , M. Afshar Alam

We describe a simple parallel-friendly lightweight graph reordering algorithm for COO graphs (edge lists). Our ``Batched Order By Attachment'' (BOBA) algorithm is linear in the number of edges in terms of reads and linear in the number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-22 Matthew Drescher , Muhammad A. Awad , Serban D. Porumbescu , John D. Owens

One of the significant problems of streaming data classification is the occurrence of concept drift, consisting of the change of probabilistic characteristics of the classification task. This phenomenon destabilizes the performance of the…

Machine Learning · Computer Science 2021-12-21 Michał Woźniak , Paweł Zyblewski , Paweł Ksieniewicz

Group-by-aggregate (GBA) queries are integral to data analysis, allowing users to group data by specific attributes and apply aggregate functions such as sum, average, and count. Database Management Systems (DBMSs) typically execute GBA…

Databases · Computer Science 2024-12-03 Gaurav Vaghasiya , Shiva Jahangiri

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup
‹ Prev 1 2 3 10 Next ›