English

Frequency Estimation with One-Sided Error

Data Structures and Algorithms 2021-11-09 v1 Computational Complexity

Abstract

Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream SS of elements from some universe U={1n}U=\{1 \ldots n\}, the goal is to compute, in a single pass, a short sketch of SS so that for any element iUi \in U, one can estimate the number xix_i of times ii occurs in SS based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator x~\tilde{x} produced by Count-Min, using O(1/εlogn)O(1/\varepsilon \cdot \log n) dimensions, guarantees that x~xεx1\|\tilde{x}-x\|_{\infty} \le \varepsilon \|x\|_1 with high probability, and x~x\tilde{x} \ge x holds deterministically. Also, Count-Min works under the assumption that x0x \ge 0. On the other hand, Count-Sketch, using O(1/ε2logn)O(1/\varepsilon^2 \cdot \log n) dimensions, guarantees that x~xεx2\|\tilde{x}-x\|_{\infty} \le \varepsilon \|x\|_2 with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the 2\ell_2 norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., x~x\tilde{x} \le x should hold always.

Keywords

Cite

@article{arxiv.2111.03953,
  title  = {Frequency Estimation with One-Sided Error},
  author = {Piotr Indyk and Shyam Narayanan and David P. Woodruff},
  journal= {arXiv preprint arXiv:2111.03953},
  year   = {2021}
}

Comments

To appear in SODA 2022. Abstract abridged to meet arXiv requirements - see pdf for full abstract