English
Related papers

Related papers: Parallel Streaming Random Sampling

200 papers

In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that…

Data Structures and Algorithms · Computer Science 2018-10-31 Jiecao Chen , Qin Zhang

The number of triangles in a graph is a fundamental metric, used in social network analysis, link classification and recommendation, and more. Driven by these applications and the trend that modern graph datasets are both large and dynamic,…

Databases · Computer Science 2013-08-12 Kanat Tangwongsan , A. Pavan , Srikanta Tirthapura

We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…

Data Structures and Algorithms · Computer Science 2025-01-20 Artur Czumaj , Gopinath Mishra , Anish Mukherjee

The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-07-26 Abhirup Chakraborty , Ajit Singh

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we…

Data Structures and Algorithms · Computer Science 2013-05-07 Timo Bingmann , Peter Sanders

Stochastic equations play an important role in computational science, due to their ability to treat a wide variety of complex statistical problems. However, current algorithms are strongly limited by their sampling variance, which scales…

Numerical Analysis · Mathematics 2017-01-04 Bogdan Opanchuk , Simon Kiesewetter , Peter D. Drummond

A new unequal probability sampling method is proposed. This method is sequential. The decision to select or not each unit is made based on the order in which the units appear. A variant of this method allows selecting a sample from a…

Methodology · Statistics 2021-11-17 Bardia Panahbehagh , Raphaël Jauslin , Yves Tillé

Given a stream of data, a typical approach in streaming algorithms is to design a sophisticated algorithm with small memory that computes a specific statistic over the streaming data. Usually, if one wants to compute a different statistic…

Data Structures and Algorithms · Computer Science 2014-08-13 Vladimir Braverman , Rafail Ostrovsky , Alan Roytman

In this paper, we design the first streaming algorithms for the problem of multitasking scheduling on parallel machines with shared processing. In one pass, our streaming approximation schemes can provide an approximate value of the optimal…

Data Structures and Algorithms · Computer Science 2022-04-06 Bin Fu , Yumei Huo , Hairong Zhao

In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent…

Data Structures and Algorithms · Computer Science 2016-11-02 Jiecao Chen , Huy L. Nguyen , Qin Zhang

We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is…

Data Structures and Algorithms · Computer Science 2022-04-06 Bin Fu , Yumei Huo , Hairong Zhao

We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose…

Databases · Computer Science 2016-03-09 Gianmarco De Francisci Morales , Aristides Gionis

This paper introduces a scheme for data stream processing which is robust to batch duration. Streaming frameworks process streams in batches retrieved at fixed time intervals. In a common setting a pattern recognition algorithm is applied…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-20 David Tolpin

In statistical learning for real-world large-scale data problems, one must often resort to "streaming" algorithms which operate sequentially on small batches of data. In this work, we present an analysis of the information-theoretic limits…

Machine Learning · Statistics 2018-01-22 Andre Manoel , Florent Krzakala , Eric W. Tramel , Lenka Zdeborová

The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation…

Data Structures and Algorithms · Computer Science 2021-11-19 Bardia Panahbehagh , Raphaël Jauslin , Yves Tillé

We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…

Data Structures and Algorithms · Computer Science 2019-11-18 Peter Sanders , Sebastian Lamm , Lorenz Hübschle-Schneider , Emanuel Schrade , Carsten Dachsbacher

The practicality of a video surveillance system is adversely limited by the amount of queries that can be placed on human resources and their vigilance in response. To transcend this limitation, a major effort under way is to include…

Computer Vision and Pattern Recognition · Computer Science 2014-05-16 Samaneh Khoshrou , Jaime S. Cardoso , Luis F. Teixeira

Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only…

Data Structures and Algorithms · Computer Science 2021-11-01 Alessandro Epasto , Mohammad Mahdian , Vahab Mirrokni , Peilin Zhong

A streaming model is one where data items arrive over long period of time, either one item at a time or in bursts. Typical tasks include computing various statistics over a sliding window of some fixed time-horizon. What makes the streaming…

Data Structures and Algorithms · Computer Science 2008-04-14 Vladimir Braverman , Rafail Ostrovsky , Carlo Zaniolo
‹ Prev 1 2 3 10 Next ›