Frequency Estimation with One-Sided Error
Abstract
Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream of elements from some universe , the goal is to compute, in a single pass, a short sketch of so that for any element , one can estimate the number of times occurs in based on the sketch alone. Two state of the art solutions to this problems are the Count-Min and Count-Sketch algorithms. The frequency estimator produced by Count-Min, using dimensions, guarantees that with high probability, and holds deterministically. Also, Count-Min works under the assumption that . On the other hand, Count-Sketch, using dimensions, guarantees that with high probability. A natural question is whether it is possible to design the best of both worlds sketching method, with error guarantees depending on the norm and space comparable to Count-Sketch, but (like Count-Min) also has the no-underestimation property. Our main set of results shows that the answer to the above question is negative. We show this in two incomparable computational models: linear sketching and streaming algorithms. We also study the complementary problem, where the sketch is required to not over-estimate, i.e., should hold always.
Cite
@article{arxiv.2111.03953,
title = {Frequency Estimation with One-Sided Error},
author = {Piotr Indyk and Shyam Narayanan and David P. Woodruff},
journal= {arXiv preprint arXiv:2111.03953},
year = {2021}
}
Comments
To appear in SODA 2022. Abstract abridged to meet arXiv requirements - see pdf for full abstract