English

Sampling Multiple Edges Efficiently

Data Structures and Algorithms 2021-07-21 v4

Abstract

We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ϵ\epsilon-close to the uniform distribution, in an \emph{amortized-efficient} fashion. We consider the adjacency list query model, where access to a graph GG is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let nn and mm denote the number of vertices and edges of GG, respectively. Eden and Rosenbaum provided upper and lower bounds of Θ(n/m)\Theta^*(n/\sqrt m) for sampling a single edge in general graphs (where O()O^*(\cdot) suppresses poly(1/ϵ)\textrm{poly}(1/\epsilon) and poly(logn)\textrm{poly}(\log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples qq in advance, has an overall cost that is sublinear in qq, namely, O(q(n/m))O^*(\sqrt q \cdot(n/\sqrt m)), which is strictly preferable to O(q(n/m))O^*(q\cdot (n/\sqrt m)) cost resulting from qq invocations of the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, T\v{e}tek and Thorup (arXiv, preprint) proved that this bound is essentially optimal.

Keywords

Cite

@article{arxiv.2008.08032,
  title  = {Sampling Multiple Edges Efficiently},
  author = {Talya Eden and Saleet Mossel and Ronitt Rubinfeld},
  journal= {arXiv preprint arXiv:2008.08032},
  year   = {2021}
}