Related papers: (Learned) Frequency Estimation Algorithms under Zi…
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically…
Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream $S$ of elements from some universe $U=\{1 \ldots n\}$, the goal is to compute, in a single pass, a short sketch of $S$ so that for any…
The Count-Min sketch is an important and well-studied data summarization method. It allows one to estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the sketch depends on…
Demands are increasing to measure per-flow statistics in the data plane of high-speed switches. Measuring flows with exact counting is infeasible due to processing and memory constraints, but a sketch is a promising candidate for collecting…
Conservative Count-Min, an improved version of Count-Min sketch [Cormode, Muthukrishnan 2005], is an online-maintained hashing-based data structure summarizing element frequency information without storing elements themselves. Although…
Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements,…
Count-Min Sketch with Conservative Updates (CMS-CU) is a popular algorithm to approximately count items' appearances in a data stream. Despite CMS-CU's widespread adoption, the theoretical analysis of its performance is still wanting…
This paper develops conformal inference methods to construct a confidence interval for the frequency of a queried object in a very large discrete data set, based on a sketch with a lower memory footprint. This approach requires no knowledge…
Count-Min Sketch with Conservative Updates (CMS-CU) is a memory-efficient hash-based data structure used to estimate the occurrences of items within a data stream. CMS-CU stores $m$ counters and employs $d$ hash functions to map items to…
Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested by the…
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…
A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no…
An influential paper of Hsu et al. (ICLR'19) introduced the study of learning-augmented streaming algorithms in the context of frequency estimation. A fundamental problem in the streaming literature, the goal of frequency estimation is to…
Recently there has been increased interest in using machine learning techniques to improve classical algorithms. In this paper we study when it is possible to construct compact, composable sketches for weighted sampling and statistics…
We consider massive distributed datasets that consist of elements modeled as key-value pairs and the task of computing statistics or aggregates where the contribution of each key is weighted by a function of its frequency (sum of values of…
The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track…
In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…
Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…
Sketches are probabilistic data structures that can provide approximate results within mathematically proven error bounds while using orders of magnitude less memory than traditional approaches. They are tailored for streaming data analysis…
Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. These algorithms allow…