Related papers: Parallel Weighted Random Sampling

Communication-Efficient (Weighted) Reservoir Sampling from Fully Distributed Data Streams

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

Robust Massively Parallel Sorting

We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-17 Michael Axtmann , Peter Sanders

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

RPS: A Generic Reservoir Patterns Sampler

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

Efficient Random Sampling -- Parallel, Vectorized, Cache-Efficient, and Online

We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…

Data Structures and Algorithms · Computer Science 2019-11-18 Peter Sanders , Sebastian Lamm , Lorenz Hübschle-Schneider , Emanuel Schrade , Carsten Dachsbacher

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

Weighted Random Sampling on GPUs

An alias table is a data structure that allows for efficiently drawing weighted random samples in constant time and can be constructed in linear time. The PSA algorithm by H\"ubschle-Schneider and Sanders is able to construct alias tables…

Data Structures and Algorithms · Computer Science 2022-05-24 Hans-Peter Lehmann , Lorenz Hübschle-Schneider , Peter Sanders

Parallel inference for massive distributed spatial data using low-rank models

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to…

Computation · Statistics 2016-02-08 Matthias Katzfuss , Dorit Hammerling

Practical Massively Parallel Sorting

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…

Data Structures and Algorithms · Computer Science 2015-02-26 Michael Axtmann , Timo Bingmann , Peter Sanders , Christian Schulz

Load Balanced Parallel Node Generation for Meshless Numerical Methods

Meshless methods are used to solve partial differential equations by approximating differential operators at a node as a weighted sum of values at its neighbours. One of the algorithms for generating nodes suitable for meshless numerical…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Jon Vehovar , Miha Rot , Matjaž Depolli , Gregor Kosec

Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions

We compare different methods for sampling from discrete probability distributions and introduce a new algorithm which is especially efficient on massively parallel processors, such as GPUs. The scheme preserves the distribution properties…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Nikolaus Binder , Alexander Keller

Distributed Weighted Matching via Randomized Composable Coresets

Maximum weight matching is one of the most fundamental combinatorial optimization problems with a wide range of applications in data mining and bioinformatics. Developing distributed weighted matching algorithms is challenging due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-06 Sepehr Assadi , MohammadHossein Bateni , Vahab Mirrokni

Limited Memory Prediction for Linear Systems with Different types of Observation

This paper is concerned with distributed limited memory prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal local limited memory…

Other Computer Science · Computer Science 2010-02-18 Ha-ryong Song , Vladimir Shin

On the Design and Analysis of Parallel and Distributed Algorithms

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use…

Mathematical Software · Computer Science 2015-06-29 François-Henry Rouet , Xiaoye S. Li , Pieter Ghysels , Artem Napov

Weighted Reservoir Sampling With Replacement from Data Streams

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

Permutation Enhanced Parallel Reconstruction with A Linear Compressive Sampling Device

In this letter, a permutation enhanced parallel reconstruction architecture for compressive sampling is proposed. In this architecture, a measurement matrix is constructed from a block-diagonal sensing matrix and the sparsifying basis of…

Information Theory · Computer Science 2014-09-01 Hao Fang , Sergiy A. Vorobyov , Hai Jiang

MergeShuffle: A Very Fast, Parallel Random Permutation Algorithm

This article introduces an algorithm, MergeShuffle, which is an extremely efficient algorithm to generate random permutations (or to randomly permute an existing array). It is easy to implement, runs in $n\log_2 n + O(1)$ time, is in-place,…

Data Structures and Algorithms · Computer Science 2015-08-14 Axel Bacher , Olivier Bodini , Alexandros Hollender , Jérémie Lumbroso

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. Here, we review recent work on developing and implementing…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-28 Jiyan Yang , Xiangrui Meng , Michael W. Mahoney

Distributed-Memory Parallel Algorithms for Fixed-Radius Near Neighbor Graph Construction

Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds with a combinatorial structure. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-17 Gabriel Raulet , Dmitriy Morozov , Aydin Buluc , Katherine Yelick