Related papers: Minimum Description Length codes are critical
Model selection is central to statistics, and many learning problems can be formulated as model selection problems. In this paper, we treat the problem of selecting a maximum entropy model given various feature subsets and their moments, as…
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result…
In the signal processing and statistics literature, the minimum description length (MDL) principle is a popular tool for choosing model complexity. Successful examples include signal denoising and variable selection in linear regression,…
The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical…
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL…
We analyze differences between two information-theoretically motivated approaches to statistical inference and model selection: the Minimum Description Length (MDL) principle, and the Minimum Message Length (MML) principle. Based on this…
The power of sparse signal coding with learned dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these…
The Minimum Description Length (MDL) principle is solidly based on a provably ideal method of inference using Kolmogorov complexity. We test how the theory behaves in practice on a general problem in model selection: that of learning the…
Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings…
Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying…
This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the…
An efficient representation of observed data has many benefits in various domains of engineering and science. Representing static data sets, such as images, is a living branch in machine learning and eases downstream tasks, such as…
A major challenge in designing efficient statistical supervised learning algorithms is finding representations that perform well not only on available training samples but also on unseen data. While the study of representation learning has…
This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for…
To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes,…
The (non-)equivalence of canonical and microcanonical ensembles is a fundamental question in statistical physics, concerning whether the use of soft and hard constraints in the maximum-entropy construction leads to the same description of a…
State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the…
The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which…
It is shown that the two-part Minimum Description Length Principle can be used to discriminate among different models that can explain a given observed dataset. The description length is chosen to be the sum of the lengths of the message…
When constructing models of the world, we aim for optimal compressions: models that include as few details as possible while remaining as accurate as possible. But which details -- or features measured in data -- should we choose to include…