English
Related papers

Related papers: Parallel Weighted Random Sampling

200 papers

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

We investigate distributed memory parallel sorting algorithms that scale to the largest available machines and are robust with respect to input size and distribution of the input elements. The main outcome is that four sorting algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-17 Michael Axtmann , Peter Sanders

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

Efficient learning from streaming data is important for modern data analysis due to the continuous and rapid evolution of data streams. Despite significant advancements in stream pattern mining, challenges persist, particularly in managing…

Machine Learning · Computer Science 2024-11-04 Lamine Diop , Marc Plantevit , Arnaud Soulet

We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…

Data Structures and Algorithms · Computer Science 2019-11-18 Peter Sanders , Sebastian Lamm , Lorenz Hübschle-Schneider , Emanuel Schrade , Carsten Dachsbacher

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

An alias table is a data structure that allows for efficiently drawing weighted random samples in constant time and can be constructed in linear time. The PSA algorithm by H\"ubschle-Schneider and Sanders is able to construct alias tables…

Data Structures and Algorithms · Computer Science 2022-05-24 Hans-Peter Lehmann , Lorenz Hübschle-Schneider , Peter Sanders

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to…

Computation · Statistics 2016-02-08 Matthias Katzfuss , Dorit Hammerling

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and…

Data Structures and Algorithms · Computer Science 2015-02-26 Michael Axtmann , Timo Bingmann , Peter Sanders , Christian Schulz

Meshless methods are used to solve partial differential equations by approximating differential operators at a node as a weighted sum of values at its neighbours. One of the algorithms for generating nodes suitable for meshless numerical…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-11 Jon Vehovar , Miha Rot , Matjaž Depolli , Gregor Kosec

We compare different methods for sampling from discrete probability distributions and introduce a new algorithm which is especially efficient on massively parallel processors, such as GPUs. The scheme preserves the distribution properties…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-02 Nikolaus Binder , Alexander Keller

Maximum weight matching is one of the most fundamental combinatorial optimization problems with a wide range of applications in data mining and bioinformatics. Developing distributed weighted matching algorithms is challenging due to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-06 Sepehr Assadi , MohammadHossein Bateni , Vahab Mirrokni

This paper is concerned with distributed limited memory prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal local limited memory…

Other Computer Science · Computer Science 2010-02-18 Ha-ryong Song , Vladimir Shin

Arrival of multicore systems has enforced a new scenario in computing, the parallel and distributed algorithms are fast replacing the older sequential algorithms, with many challenges of these techniques. The distributed algorithms provide…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Rajendra Purohit , K R Chowdhary , S D Purohit

We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use…

Mathematical Software · Computer Science 2015-06-29 François-Henry Rouet , Xiaoye S. Li , Pieter Ghysels , Artem Napov

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

In this letter, a permutation enhanced parallel reconstruction architecture for compressive sampling is proposed. In this architecture, a measurement matrix is constructed from a block-diagonal sensing matrix and the sparsifying basis of…

Information Theory · Computer Science 2014-09-01 Hao Fang , Sergiy A. Vorobyov , Hai Jiang

This article introduces an algorithm, MergeShuffle, which is an extremely efficient algorithm to generate random permutations (or to randomly permute an existing array). It is easy to implement, runs in $n\log_2 n + O(1)$ time, is in-place,…

Data Structures and Algorithms · Computer Science 2015-08-14 Axel Bacher , Olivier Bodini , Alexandros Hollender , Jérémie Lumbroso

In this era of large-scale data, distributed systems built on top of clusters of commodity hardware provide cheap and reliable storage and scalable processing of massive data. Here, we review recent work on developing and implementing…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-28 Jiyan Yang , Xiangrui Meng , Michael W. Mahoney

Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds with a combinatorial structure. As…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-17 Gabriel Raulet , Dmitriy Morozov , Aydin Buluc , Katherine Yelick
‹ Prev 1 2 3 10 Next ›