Related papers: Entropy estimates of small data sets
We consider the problem of finite sample corrections for entropy estimation. New estimates of the Shannon entropy are proposed and their systematic error (the bias) is computed analytically. We find that our results cover correction…
We present a detailed derivation of some estimators of Shannon entropy for discrete distributions. They hold for finite samples of N points distributed into M "boxes", with N and M -> oo, but N/M < oo. In the high sampling regime (<< 1…
Reliable data-driven estimation of Shannon entropy from small data sets, where the number of examples is potentially smaller than the number of possible outcomes, is a critical matter in several applications. In this paper, we introduce a…
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that…
We introduce unbiased estimators for the Shannon entropy and the class number, in the situation that we are able to take sequences of independent samples of arbitrary length.
We present a new class of estimators of Shannon entropy for severely undersampled discrete distributions. It is based on a generalization of an estimator proposed by T. Schuermann, which itself is a generalization of an estimator proposed…
Compressed Counting (CC)} was recently proposed for approximating the $\alpha$th frequency moments of data streams, for $0<\alpha \leq 2$. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on…
We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a…
A class of estimators of the R\'{e}nyi and Tsallis entropies of an unknown distribution $f$ in $\mathbb{R}^m$ is presented. These estimators are based on the $k$th nearest-neighbor distances computed from a sample of $N$ i.i.d. vectors with…
Estimation of Shannon and R\'enyi entropies of unknown discrete distributions is a fundamental problem in statistical property testing and an active research topic in both theoretical computer science and information theory. Tight bounds on…
Entropy is the measure of uncertainty in any data and is adopted for maximisation of mutual information in many remote sensing operations. The availability of wide entropy variations motivated us for an investigation over the suitability…
The Shannon entropy, and related quantities such as mutual information, can be used to quantify uncertainty and relevance. However, in practice, it can be difficult to compute these quantities for arbitrary probability distributions,…
Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled…
Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying…
Entropy and relative or cross entropy measures are two very fundamental concepts in information theory and are also widely used for statistical inference across disciplines. The related optimization problems, in particular the maximization…
Algorithmic entropy and Shannon entropy are two conceptually different information measures, as the former is based on size of programs and the later in probability distributions. However, it is known that, for any recursive probability…
A method of estimating the joint probability mass function of a pair of discrete random variables is described. This estimator is used to construct the conditional Shannon-R\'eyni-Tsallis entropies estimates. From there almost sure rates of…
There are numerous characterizations of Shannon entropy and Tsallis entropy as measures of information obeying certain properties. Using work by Faddeev and Furuichi, we derive a very simple characterization. Instead of focusing on the…
We consider the problem of approximating the empirical Shannon entropy of a high-frequency data stream under the relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure of entropy is…
Entropy estimation plays a significant role in biology, economics, physics, communication engineering and other disciplines. It is increasingly used in software engineering, e.g. in software confidentiality, software testing, predictive…