Related papers: Optimal Tracking of Distributed Heavy Hitters and …
In this paper, we give efficient algorithms and lower bounds for solving the heavy hitters problem while preserving differential privacy in the fully distributed local model. In this model, there are n parties, each of which possesses a…
We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by…
Distributed Denial of Service (DDoS) attacks have become more prominent recently, both in frequency of occurrence, as well as magnitude. Such attacks render key Internet resources unavailable and disrupt its normal operation. It is…
Given a stream $S = (s_1, s_2, ..., s_N)$, a $\phi$-heavy hitter is an item $s_i$ that occurs at least $\phi N$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we…
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers…
Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. These algorithms allow…
We consider the problems of distributed heavy hitters and frequency moments in both the coordinator model and the distributed tracking model (also known as the distributed functional monitoring model). We present simple and optimal (up to…
Inspired by the great success of machine learning in the past decade, people have been thinking about the possibility of improving the theoretical results by exploring data distribution. In this paper, we revisit a fundamental problem…
We give the first optimal bounds for returning the $\ell_1$-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of $m$ items in $\{1, 2, \dots,…
We consider online mining of correlated heavy-hitters from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes…
An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-$k$, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem,…
The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming…
The distinct elements problem is one of the fundamental problems in streaming algorithms --- given a stream of integers in the range $\{1,\ldots,n\}$, we wish to provide a $(1+\varepsilon)$ approximation to the number of distinct elements…
Heavy hitters and frequency measurements are fundamental in many networking applications such as load balancing, QoS, and network security. This paper considers a generalized sliding window model that supports frequency and heavy hitters…
We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the {\em count-tracking} problem, where there are $k$ players, each holding a counter $n_i$ that gets…
Given a stream $x_1,x_2,\dots,x_n$ of items from a Universe $U$ of size poly$(n)$, and a parameter $\epsilon>0$, an item $i\in U$ is said to be an $\ell_2$ heavy hitter if its frequency $f_i$ in the stream is at least $\sqrt{\epsilon F_2}$,…
In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections. Rather than formalize this as a clustering problem, in which all…
We study the problem of tracking multiple moving targets using a team of mobile robots. Each robot has a set of motion primitives to choose from in order to collectively maximize the number of targets tracked or the total quality of…
Data streams typically have items of large number of dimensions. We study the fundamental heavy-hitters problem in this setting. Formally, the data stream consists of $d$-dimensional items $x_1,\ldots,x_m \in [n]^d$. A $k$-dimensional…
This paper studies the classic problem of finding heavy hitters in the turnstile streaming model. We give the first deterministic linear sketch that has $O(\epsilon^{-2} \log n \cdot \log^*(\epsilon^{-1}))$ rows and answers queries in…