Related papers: An Improved Interactive Streaming Algorithm for th…
The efficient estimation of frequency moments of a data stream in one-pass using limited space and time per item is one of the most fundamental problem in data stream processing. An especially important estimation is to find the number of…
Estimating the first moment of a data stream defined as $F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i}$ to within $1 \pm \epsilon$-relative error with high probability is a basic and influential problem in data stream processing. A tight…
The distinct elements problem is one of the fundamental problems in streaming algorithms --- given a stream of integers in the range $\{1,\ldots,n\}$, we wish to provide a $(1+\varepsilon)$ approximation to the number of distinct elements…
We study the problem of partitioning integer sequences in the one-pass data streaming model. Given is an input stream of integers $X \in \{0, 1, \dots, m \}^n$ of length $n$ with maximum element $m$, and a parameter $p$. The goal is to…
We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements…
We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It…
Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from $\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k = \sum_{i=1}^n f_i^k$. We…
In a ground-breaking paper, Indyk and Woodruff (STOC 05) showed how to compute $F_k$ (for $k>2$) in space complexity $O(\mbox{\em poly-log}(n,m)\cdot n^{1-\frac2k})$, which is optimal up to (large) poly-logarithmic factors in $n$ and $m$,…
When facing a very large stream of data, it is often desirable to extract most important statistics online in a short time and using small memory. For example, one may want to quickly find the most influential users generating posts online…
In this paper we consider the problem of approximating frequency moments in the streaming model. Given a stream $D = \{p_1,p_2,\dots,p_m\}$ of numbers from $\{1,\dots, n\}$, a frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The…
Estimating the second frequency moment $F_2$ of a data stream up to a $(1 \pm \varepsilon)$ factor is a central problem in the streaming literature. For errors $\varepsilon > \Omega(1/\sqrt{n})$, the tight bound…
We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the…
For each $p \in (0,2]$, we present a randomized algorithm that returns an $\epsilon$-approximation of the $p$th frequency moment of a data stream $F_p = \sum_{i = 1}^n \abs{f_i}^p$. The algorithm requires space $O(\epsilon^{-2} \log…
Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed…
We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to…
We study the classic NP-Hard problem of finding the maximum $k$-set coverage in the data stream model: given a set system of $m$ sets that are subsets of a universe $\{1,\ldots,n \}$, find the $k$ sets that cover the most number of distinct…
We study the power of Arthur-Merlin probabilistic proof systems in the data stream model. We show a canonical $\mathcal{AM}$ streaming algorithm for a wide class of data stream problems. The algorithm offers a tradeoff between the length of…
We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order…
Detecting frequent elements is among the oldest and most-studied problems in the area of data streams. Given a stream of $m$ data items in $\{1, 2, \dots, n\}$, the objective is to output items that appear at least $d$ times, for some…
Constraint satisfaction problems (CSP's) and data stream models are two powerful abstractions to capture a wide variety of problems arising in different domains of computer science. Developments in the two communities have mostly occurred…