Efficient Random Sampling -- Parallel, Vectorized, Cache-Efficient, and Online
Data Structures and Algorithms
2019-11-18 v2 Distributed, Parallel, and Cluster Computing
Mathematical Software
Abstract
We consider the problem of sampling numbers from the range without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time on processors, i.e., scales to massively parallel machines even for moderate values of . The amount of communication between the processors is very small (at most ) and independent of the sample size. We also discuss modifications needed for load balancing, online sampling, sampling with replacement, Bernoulli sampling, and vectorization on SIMD units or GPUs.
Cite
@article{arxiv.1610.05141,
title = {Efficient Random Sampling -- Parallel, Vectorized, Cache-Efficient, and Online},
author = {Peter Sanders and Sebastian Lamm and Lorenz Hübschle-Schneider and Emanuel Schrade and Carsten Dachsbacher},
journal= {arXiv preprint arXiv:1610.05141},
year = {2019}
}