English

Optimal Quantile Approximation in Streams

Data Structures and Algorithms 2016-04-07 v2

Abstract

This paper resolves one of the longest standing basic problems in the streaming computational model. Namely, optimal construction of quantile sketches. An ε\varepsilon approximate quantile sketch receives a stream of items x1,,xnx_1,\ldots,x_n and allows one to approximate the rank of any query up to additive error εn\varepsilon n with probability at least 1δ1-\delta. The rank of a query xx is the number of stream items such that xixx_i \le x. The minimal sketch size required for this task is trivially at least 1/ε1/\varepsilon. Felber and Ostrovsky obtain a O((1/ε)log(1/ε))O((1/\varepsilon)\log(1/\varepsilon)) space sketch for a fixed δ\delta. To date, no better upper or lower bounds were known even for randomly permuted streams or for approximating a specific quantile, e.g.,\ the median. This paper obtains an O((1/ε)loglog(1/δ))O((1/\varepsilon)\log \log (1/\delta)) space sketch and a matching lower bound. This resolves the open problem and proves a qualitative gap between randomized and deterministic quantile sketching. One of our contributions is a novel representation and modification of the widely used merge-and-reduce construction. This subtle modification allows for an analysis which is both tight and extremely simple. Similar techniques should be useful for improving other sketching objectives and geometric coreset constructions.

Keywords

Cite

@article{arxiv.1603.05346,
  title  = {Optimal Quantile Approximation in Streams},
  author = {Zohar Karnin and Kevin Lang and Edo Liberty},
  journal= {arXiv preprint arXiv:1603.05346},
  year   = {2016}
}