Related papers: Normalized information-based divergences
The minimum rate needed to accurately approximate a product distribution based on an unnormalized informational divergence is shown to be a mutual information. This result subsumes results of Wyner on common information and Han-Verd\'{u} on…
We study the complexity of approximations to the normalized information distance. We introduce a hierarchy of computable approximations by considering the number of oscillations. This is a function version of the difference hierarchy for…
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased…
As a part of the construction of an information theory based on general probabilistic theories, we propose and investigate the several distinguishability measures and "entropies" in general probabilistic theories. As their applications,…
By combining a bound on the absolute value of the difference of mutual information between two joint probablity distributions with a fixed variational distance, and a bound on the probability of a maximal deviation in variational distance…
Information theory is a mathematical theory of learning with deep connections with topics as diverse as artificial intelligence, statistical physics, and biological evolution. Many primers on information theory paint a broad picture with…
To what extent can we distinguish one probability distribution from another? Are there quantitative measures of distinguishability? The goal of this tutorial is to approach such questions by introducing the notion of the "distance" between…
Bregman divergences are a class of distance-like comparison functions which play fundamental roles in optimization, statistics, and information theory. One important property of Bregman divergences is that they cause two useful formulations…
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…
We define a measure of redundant information based on projections in the space of probability distributions. Redundant information between random variables is information that is shared between those variables. But in contrast to mutual…
Mutual information is a general statistical dependency measure which has found applications in representation learning, causality, domain generalization and computational biology. However, mutual information estimators are typically…
We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in…
Complex systems often exhibit multiple levels of organization covering a wide range of physical scales, so the study of the hierarchical decomposition of their structure and function is frequently convenient. To better understand this…
Estimating Mutual Information (MI), a key measure of dependence of random quantities without specific modelling assumptions, is a challenging problem in high dimensions. We propose a novel mutual information estimator based on parametrizing…
Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each…
We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding of generalization in learning problems,…
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for…
Randomness in scientific estimation is generally assumed to arise from unmeasured or uncontrolled factors. However, when combining subjective probability estimates, heterogeneity stemming from people's cognitive or information diversity is…
Observations on the past provide some hints about what will happen in the future, and this can be quantified using information theory. The ``predictive information'' defined in this way has connections to measures of complexity that have…
Randomness in scientific estimation is generally assumed to arise from unmeasured or uncontrolled factors. However, when combining subjective probability estimates, heterogeneity stemming from people's cognitive or information diversity is…