Related papers: Streaming Algorithms from Precision Sampling
The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also…
This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete…
We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/\delta))$ bits…
Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized…
We study the classical problem of moment estimation of an underlying vector whose $n$ coordinates are implicitly defined through a series of updates in a data stream. We show that if the updates to the vector arrive in the random-order…
Most known algorithms in the streaming model of computation aim to approximate a single function such as an $\ell_p$-norm. In 2009, Nelson [\url{https://sublinear.info}, Open Problem 30] asked if it possible to design \emph{universal…
We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…
Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from $\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k = \sum_{i=1}^n f_i^k$. We…
In insertion-only streaming, one sees a sequence of indices $a_1, a_2, \ldots, a_m\in [n]$. The stream defines a sequence of $m$ frequency vectors $x^{(1)},\ldots,x^{(m)}\in\mathbb{R}^n$ with $(x^{(t)})_i = |\{j : j\in[t], a_j = i\}|$. That…
Histograms, i.e., piece-wise constant approximations, are a popular tool used to represent data distributions. Traditionally, the difference between the histogram and the underlying distribution (i.e., the approximation error) is measured…
The probabilistic-stream model was introduced by Jayram et al. \cite{JKV07}. It is a generalization of the data stream model that is suited to handling ``probabilistic'' data where each item of the stream represents a probability…
The manuscript introduces a method to select a random sample from a stream by deciding on each sampling unit immediately after observing it. The process could be applied to unequal as well as equal probability sampling. The implementation…
For each $p \in (0,2]$, we present a randomized algorithm that returns an $\epsilon$-approximation of the $p$th frequency moment of a data stream $F_p = \sum_{i = 1}^n \abs{f_i}^p$. The algorithm requires space $O(\epsilon^{-2} \log…
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire…
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…
In this paper we study how to perform distinct sampling in the streaming model where data contain near-duplicates. The goal of distinct sampling is to return a distinct element uniformly at random from the universe of elements, given that…
From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir…
A streaming algorithm to compute the spectral proper orthogonal decomposition (SPOD) of stationary random processes is presented. As new data becomes available, an incremental update of the truncated eigenbasis of the estimated…
The \emph{$\ell_2$ tracking problem} is the task of obtaining a streaming algorithm that, given access to a stream of items $a_1,a_2,a_3,\ldots$ from a universe $[n]$, outputs at each time $t$ an estimate to the $\ell_2$ norm of the…
In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these…