English
Related papers

Related papers: A Very Efficient Scheme for Estimating Entropy of …

200 papers

The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that…

Data Structures and Algorithms · Computer Science 2009-10-09 Ping Li

Compressed Counting (CC) [22] was recently proposed for estimating the ath frequency moments of data streams, where 0 < a <= 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the ath…

Data Structures and Algorithms · Computer Science 2012-05-14 Ping Li

Compressed Counting (CC), based on maximally skewed stable random projections, was recently proposed for estimating the p-th frequency moments of data streams. The case p->1 is extremely useful for estimating Shannon entropy of data…

Data Structures and Algorithms · Computer Science 2009-10-09 Ping Li

Compressed Counting (CC) was recently proposed for very efficiently computing the (approximate) $\alpha$th frequency moments of data streams, where $0<\alpha <= 2$. Several estimators were reported including the geometric mean estimator,…

Data Structures and Algorithms · Computer Science 2008-08-14 Ping Li

We consider the problem of approximating the empirical Shannon entropy of a high-frequency data stream under the relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure of entropy is…

Computation · Statistics 2013-04-18 Peter Clifford , Ioana Ada Cosma

Counting is among the most fundamental operations in computing. For example, counting the pth frequency moment has been a very active area of research, in theoretical computer science, databases, and data mining. When p=1, the task (i.e.,…

Information Theory · Computer Science 2008-02-24 Ping Li

Estimating the p-th frequency moment of data stream is a very heavily studied problem. The problem is actually trivial when p = 1, assuming the strict Turnstile model. The sample complexity of our proposed algorithm is essentially O(1) near…

Data Structures and Algorithms · Computer Science 2015-03-14 Ping Li

Estimation of Shannon and R\'enyi entropies of unknown discrete distributions is a fundamental problem in statistical property testing and an active research topic in both theoretical computer science and information theory. Tight bounds on…

Quantum Physics · Physics 2023-07-19 Tongyang Li , Xiaodi Wu

Estimating entropies from limited data series is known to be a non-trivial task. Naive estimations are plagued with both systematic (bias) and statistical errors. Here, we present a new 'balanced estimator' for entropy functionals Shannon,…

Statistical Mechanics · Physics 2008-04-30 Juan A. Bonachela , Haye Hinrichsen , Miguel A. Munoz

We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain…

Data Structures and Algorithms · Computer Science 2008-12-18 Nicholas J. A. Harvey , Jelani Nelson , Krzysztof Onak

Data partitioning that maximizes/minimizes the Shannon entropy, or more generally the R\'enyi entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be…

Data Structures and Algorithms · Computer Science 2025-11-05 Aryan Esmailpour , Sanjay Krishnan , Stavros Sintos

Algorithmic entropy and Shannon entropy are two conceptually different information measures, as the former is based on size of programs and the later in probability distributions. However, it is known that, for any recursive probability…

Information Theory · Computer Science 2010-06-03 Andreia Teixeira , Andre Souto , Armando Matos , Luis Antunes

A new method is proposed for analyzing complexity and studying the information in random geometric networks using Tsallis entropy tool. Tsallis entropy of the ensemble of random geometric networks is calculated based on the components of…

Statistical Mechanics · Physics 2025-02-20 O. K. Kazemi , S. M. Taheri

We propose skewed stable random projections for approximating the pth frequency moments of dynamic data streams (0<p<=2), which has been frequently studied in theoretical computer science and database communities. Our method significantly…

Data Structures and Algorithms · Computer Science 2008-02-07 Ping Li

Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples,…

Machine Learning · Computer Science 2023-05-16 Gecia Bravo-Hermsdorff , Róbert Busa-Fekete , Mohammad Ghavamzadeh , Andres Muñoz Medina , Umar Syed

The weak law of large numbers implies that, under mild assumptions on the source, the Renyi entropy per produced symbol converges (in probability) towards the Shannon entropy rate. This paper quantifies the speed of this convergence for…

Information Theory · Computer Science 2017-05-01 Maciej Skorski

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Subword tokenization is a key part of many NLP pipelines. However, little is known about why some tokenizer and hyperparameter combinations lead to better downstream model performance than others. We propose that good tokenizers lead to…

Computation and Language · Computer Science 2023-06-30 Vilém Zouhar , Clara Meister , Juan Luis Gastaldi , Li Du , Mrinmaya Sachan , Ryan Cotterell

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete…

Data Structures and Algorithms · Computer Science 2026-05-08 Thomas L. Draper , Feras A. Saad

Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be…

Information Theory · Computer Science 2014-03-24 Ishanu Chattopadhyay , Hod Lipson
‹ Prev 1 2 3 10 Next ›