Related papers: Revisiting Norm Estimation in Data Streams
For each $p \in (0,2]$, we present a randomized algorithm that returns an $\epsilon$-approximation of the $p$th frequency moment of a data stream $F_p = \sum_{i = 1}^n \abs{f_i}^p$. The algorithm requires space $O(\epsilon^{-2} \log…
We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order…
We present an algorithm for computing $F_p$, the $p$th moment of an $n$-dimensional frequency vector of a data stream, for $2 < p < \log (n) $, to within $1\pm \epsilon$ factors, $\epsilon \in [n^{-1/p},1]$ with high constant probability.…
We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It…
We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/\delta))$ bits…
We show an improved lower bound for the Fp estimation problem in a data stream setting for p>2. A data stream is a sequence of items from the domain [n] with possible repetitions. The frequency vector x is an n-dimensional non-negative…
We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in…
One of the oldest problems in the data stream model is to approximate the $p$-th moment $\|\mathcal{X}\|_p^p = \sum_{i=1}^n |\mathcal{X}_i|^p$ of an underlying vector $\mathcal{X} \in \mathbb{R}^n$, which is presented as a sequence of…
A data stream is viewed as a sequence of $M$ updates of the form $(\text{index},i,v)$ to an $n$-dimensional integer frequency vector $f$, where the update changes $f_i$ to $f_i + v$, and $v$ is an integer and assumed to be in $\{-m, ...,…
Estimating the first moment of a data stream defined as $F_1 = \sum_{i \in \{1, 2, \ldots, n\}} \abs{f_i}$ to within $1 \pm \epsilon$-relative error with high probability is a basic and influential problem in data stream processing. A tight…
A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called…
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these…
The \emph{$\ell_2$ tracking problem} is the task of obtaining a streaming algorithm that, given access to a stream of items $a_1,a_2,a_3,\ldots$ from a universe $[n]$, outputs at each time $t$ an estimate to the $\ell_2$ norm of the…
Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured…
We propose a novel framework for statistical estimation on noisy datasets. Within this framework, we focus on the frequency moments ($F_p$) problem and demonstrate that it is possible to approximate $F_p$ of the unknown ground-truth dataset…
Computing the approximate quantiles or ranks of a stream is a fundamental task in data monitoring. Given a stream of elements $x_1, x_2, \dots, x_n$ and a query $x$, a relative-error quantile estimation algorithm can estimate the rank of…
We present a randomized algorithm for estimating the $p$th moment $F_p$ of the frequency vector of a data stream in the general update (turnstile) model to within a multiplicative factor of $1 \pm \epsilon$, for $p > 2$, with high constant…
In a streaming constraint satisfaction problem (streaming CSP), a $p$-pass algorithm receives the constraints of an instance sequentially, making $p$ passes over the input in a fixed order, with the goal of approximating the maximum…
Given a finite set of points $P \subseteq \mathbb{R}^d$, we would like to find a small subset $S \subseteq P$ such that the convex hull of $S$ approximately contains $P$. More formally, every point in $P$ is within distance $\epsilon$ from…
In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central…