Related papers: $\textit{sentropy}$: A Python Package for Revealin…
This article illustrates how to measure the heterogeneity of spatial data presenting a finite number of categories via computation of spatial entropy. The R package SpatEntropy contains functions for the computation of entropy and spatial…
Mixture distributions are a workhorse model for multimodal data in information theory, signal processing, and machine learning. Yet even when each component density is simple, the differential entropy of the mixture is notoriously hard to…
The synthpop package for R https://www.synthpop.org.uk provides tools to allow data custodians to create synthetic versions of confidential microdata that can be distributed with fewer restrictions than the original. The synthesis can be…
Information theory, i.e. the mathematical analysis of information and of its processing, has become a tenet of modern science; yet, its use in real-world studies is usually hindered by its computational complexity, the lack of coherent…
The diversity across outputs generated by LLMs shapes perception of their quality and utility. High lexical diversity is often desirable, but there is no standard method to measure this property. Templated answer structures and ``canned''…
Shannon entropy is not the only entropy that is relevant to machine-learning datasets, nor possibly even the most important one. Traditional entropies such as Shannon entropy capture information represented by elements' frequencies but not…
Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent…
Measuring inter-dataset similarity is an important task in machine learning and data mining with various use cases and applications. Existing methods for measuring inter-dataset similarity are computationally expensive, limited, or…
Approximation of entropies of various types using machine learning (ML) regression methods are shown for the first time. The ML models presented in this study define the complexity of the short time series by approximating dissimilar…
The concept of Entropy plays a key role in Information Theory, Statistics, and Machine Learning.This paper introduces a new entropy measure, called the t-entropy, which exploits the concavity of the inverse-tan function. We analytically…
Shannon Entropy is the preeminent tool for measuring the level of uncertainty (and conversely, information content) in a random variable. In the field of communications, entropy can be used to express the information content of given…
Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard…
The profile of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Specifically,…
Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be…
We introduce Extrema-Segmented Entropy (ExSEnt), a feature-decomposed framework for quantifying time-series complexity that separates temporal from amplitude contributions. The method partitions a signal into monotonic segments by detecting…
Data partitioning that maximizes/minimizes the Shannon entropy, or more generally the R\'enyi entropy is a crucial subroutine in data compression, columnar storage, and cardinality estimation algorithms. These partition algorithms can be…
This paper presents a command-line tool, called Entropia, that implements a family of conformance checking measures for process mining founded on the notion of entropy from information theory. The measures allow quantifying classical…
The entropy of a quantum system is a measure of its randomness, and has applications in measuring quantum entanglement. We study the problem of measuring the von Neumann entropy, $S(\rho)$, and R\'enyi entropy, $S_\alpha(\rho)$ of an…
The notion of software entropy is often invoked to describe the tendency of software systems to become increasingly disordered as they evolve, yet existing approaches to quantify it are largely heuristic. In this work we introduce a formal…
We propose a new type of entropic descriptor that is able to quantify the statistical complexity (a measure of complex behaviour) by taking simultaneously into account the average departures of a system's entropy S from both its maximum…