English
Related papers

Related papers: On Approximating Frequency Moments of Data Streams…

200 papers

Compressed Counting (CC), based on maximally skewed stable random projections, was recently proposed for estimating the p-th frequency moments of data streams. The case p->1 is extremely useful for estimating Shannon entropy of data…

Data Structures and Algorithms · Computer Science 2009-10-09 Ping Li

Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near…

Data Structures and Algorithms · Computer Science 2015-03-14 Ping Li

Counting is among the most fundamental operations in computing. For example, counting the pth frequency moment has been a very active area of research, in theoretical computer science, databases, and data mining. When p=1, the task (i.e.,…

Information Theory · Computer Science 2008-02-24 Ping Li

We consider the problem of sketching the $p$-th frequency moment of a vector, $p>2$, with multiplicative error at most $1\pm \epsilon$ and \emph{with high confidence} $1-\delta$. Despite the long sequence of work on this problem, tight…

Data Structures and Algorithms · Computer Science 2018-05-29 Sumit Ganguly , David P. Woodruff

For each $p \in (0,2]$, we present a randomized algorithm that returns an $\epsilon$-approximation of the $p$th frequency moment of a data stream $F_p = \sum_{i = 1}^n \abs{f_i}^p$. The algorithm requires space $O(\epsilon^{-2} \log…

Data Structures and Algorithms · Computer Science 2010-06-21 Sumit Ganguly

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called…

Data Structures and Algorithms · Computer Science 2011-04-26 Alexandr Andoni , Robert Krauthgamer , Krzysztof Onak

We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It…

Data Structures and Algorithms · Computer Science 2018-03-07 Vladimir Braverman , Emanuele Viola , David Woodruff , Lin F. Yang

Modern stream processing systems often need to track the frequency of distinct keys in a data stream in real-time. Since maintaining exact counts can require a prohibitive amount of memory, many applications rely on compact, probabilistic…

Data Structures and Algorithms · Computer Science 2026-04-29 Navid Eslami , Ioana O. Bercea , Rasmus Pagh , Niv Dayan

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from $\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The $k$-th \emph{frequency moment} of $D$ is defined as $F_k = \sum_{i=1}^n f_i^k$. We…

Data Structures and Algorithms · Computer Science 2012-12-05 Vladimir Braverman , Rafail Ostrovsky

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li

In this paper we consider the problem of approximating frequency moments in the streaming model. Given a stream $D = \{p_1,p_2,\dots,p_m\}$ of numbers from $\{1,\dots, n\}$, a frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The…

Data Structures and Algorithms · Computer Science 2014-01-28 Vladimir Braverman , Jonathan Katzman , Charles Seidell , Gregory Vorsanger

We present an algorithm for computing $F_p$, the $p$th moment of an $n$-dimensional frequency vector of a data stream, for $2 < p < \log (n) $, to within $1\pm \epsilon$ factors, $\epsilon \in [n^{-1/p},1]$ with high constant probability.…

Data Structures and Algorithms · Computer Science 2015-03-19 Sumit Ganguly

The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also…

Data Structures and Algorithms · Computer Science 2009-04-09 Daniel M. Kane , Jelani Nelson , David P. Woodruff

We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams…

Data Structures and Algorithms · Computer Science 2012-09-26 Neta Barkay , Ely Porat , Bar Shalem

Skewness is a common occurrence in statistical applications. In recent years, various distribution families have been proposed to model skewed data by introducing unequal scales based on the median or mode. However, we argue that the point…

Methodology · Statistics 2024-01-10 Yiyuan She , Xiaoqiang Wu , Lizhu Tao , Debajyoti Sinha

Sine-skewed circular distributions are identifiable and have easily-computable trigonometric moments and a simple random number generation algorithm, whereas they are known to have relatively low levels of asymmetry. This study proposes a…

Methodology · Statistics 2024-02-16 Yoichi Miyata , Takayuki Shiohama , Toshihiro Abe

This paper will focus on three different aspects in improving the current practice of stable random projections. Firstly, we propose {\em very sparse stable random projections} to significantly reduce the processing and storage cost, by…

Data Structures and Algorithms · Computer Science 2007-07-13 Ping Li

We consider the problem of high-dimensional heavy-tailed statistical estimation in the streaming setting, which is much harder than the traditional batch setting due to memory constraints. We cast this problem as stochastic convex…

Machine Learning · Statistics 2024-10-29 Aniket Das , Dheeraj Nagaraj , Soumyabrata Pal , Arun Suggala , Prateek Varshney

A data stream is viewed as a sequence of $M$ updates of the form $(\text{index},i,v)$ to an $n$-dimensional integer frequency vector $f$, where the update changes $f_i$ to $f_i + v$, and $v$ is an integer and assumed to be in $\{-m, ...,…

Data Structures and Algorithms · Computer Science 2010-06-01 Sumit Ganguly , Purushottam Kar
‹ Prev 1 2 3 10 Next ›