English

Parallel Streaming Random Sampling

Data Structures and Algorithms 2019-06-11 v1

Abstract

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential counterpart, their parallel depth is small (polylogarithmic), and their memory usage matches the best known.

Keywords

Cite

@article{arxiv.1906.04120,
  title  = {Parallel Streaming Random Sampling},
  author = {Kanat Tangwongsan and Srikanta Tirthapura},
  journal= {arXiv preprint arXiv:1906.04120},
  year   = {2019}
}