Related papers: Minimum Description Length and Generalization Guar…

Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors

We establish in-expectation and tail bounds on the generalization error of representation learning type algorithms. The bounds are in terms of the relative entropy between the distribution of the representations extracted from the training…

Machine Learning · Statistics 2025-03-21 Milad Sefidgaran , Abdellatif Zaidi , Piotr Krasnowski

Information-Theoretic Probing with Minimum Description Length

To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes,…

Computation and Language · Computer Science 2020-03-30 Elena Voita , Ivan Titov

Minimum Description Length Revisited

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL…

Methodology · Statistics 2019-12-19 Peter Grünwald , Teemu Roos

Minimum Description Length Principle in Supervised Learning with Application to Lasso

The minimum description length (MDL) principle in supervised learning is studied. One of the most important theories for the MDL principle is Barron and Cover's theory (BC theory), which gives a mathematical justification of the MDL…

Information Theory · Computer Science 2016-07-12 Masanori Kawakita , Jun'ichi Takeuchi

Learning Optimal Representations with the Decodable Information Bottleneck

We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information…

Machine Learning · Computer Science 2021-07-19 Yann Dubois , Douwe Kiela , David J. Schwab , Ramakrishna Vedantam

A Minimum Description Length Approach to Regularization in Neural Networks

State-of-the-art neural networks can be trained to become remarkable solutions to many problems. But while these architectures can express symbolic, perfect solutions, trained models often arrive at approximations instead. We show that the…

Machine Learning · Computer Science 2025-09-09 Matan Abudy , Orr Well , Emmanuel Chemla , Roni Katzir , Nur Lan

Sparsification and feature selection by compressive linear regression

The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which…

Machine Learning · Statistics 2015-03-13 Florin Popescu , Daniel Renz

Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

We study the problem of distributed multi-view representation learning. In this problem, $K$ agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner…

Machine Learning · Statistics 2025-04-28 Milad Sefidgaran , Abdellatif Zaidi , Piotr Krasnowski

Sequential Learning Of Neural Networks for Prequential MDL

Minimum Description Length (MDL) provides a framework and an objective for principled model evaluation. It formalizes Occam's Razor and can be applied to data from non-stationary sources. In the prequential formulation of MDL, the objective…

Machine Learning · Statistics 2022-10-17 Jorg Bornschein , Yazhe Li , Marcus Hutter

Representation Learning from Limited Educational Data with Crowdsourced Labels

Representation learning has been proven to play an important role in the unprecedented success of machine learning models in numerous tasks, such as machine translation, face recognition and recommendation. The majority of existing…

Machine Learning · Computer Science 2020-09-24 Wentao Wang , Guowei Xu , Wenbiao Ding , Gale Yan Huang , Guoliang Li , Jiliang Tang , Zitao Liu

A Theory of Machine Understanding via the Minimum Description Length Principle

Deep neural networks trained through end-to-end learning have achieved remarkable success across various domains in the past decade. However, the end-to-end learning strategy, originally designed to minimize predictive loss in a black-box…

Machine Learning · Computer Science 2025-06-11 Canlin Zhang , Xiuwen Liu

Do Compressed Representations Generalize Better?

One of the most studied problems in machine learning is finding reasonable constraints that guarantee the generalization of a learning algorithm. These constraints are usually expressed as some simplicity assumptions on the target. For…

Machine Learning · Computer Science 2020-01-03 Hassan Hafez-Kolahi , Shohreh Kasaei , Mahdiyeh Soleymani-Baghshah

Representation Learning: A Review and New Perspectives

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind…

Machine Learning · Computer Science 2014-04-24 Yoshua Bengio , Aaron Courville , Pascal Vincent

Asymptotics of Discrete MDL for Online Prediction

Minimum Description Length (MDL) is an important principle for induction and prediction, with strong relations to optimal Bayesian learning. This paper deals with learning non-i.i.d. processes by means of two-part MDL, where the underlying…

Information Theory · Computer Science 2007-07-13 Jan Poland , Marcus Hutter

Differential Description Length for Hyperparameter Selection in Machine Learning

This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. Minimum description length (MDL) is an established method for model selection, which is however not directly aimed at…

Machine Learning · Computer Science 2019-05-23 Mojtaba Abolfazli , Anders Host-Madsen , June Zhang

Learning from Label Proportions in Brain-Computer Interfaces: Online Unsupervised Learning with Guarantees

Objective: Using traditional approaches, a Brain-Computer Interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g.~by transfer of a pre-trained…

Machine Learning · Statistics 2017-07-05 D Hübner , T Verhoeven , K Schmid , K-R Müller , M Tangermann , P-J Kindermans

Compression-Based Regularization with an Application to Multi-Task Learning

This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to…

Machine Learning · Statistics 2018-11-14 Matías Vera , Leonardo Rey Vega , Pablo Piantanida

A study of the classification of low-dimensional data with supervised manifold learning

Supervised manifold learning methods learn data representations by preserving the geometric structure of data while enhancing the separation between data samples from different classes. In this work, we propose a theoretical study of…

Machine Learning · Computer Science 2018-01-08 Elif Vural , Christine Guillemot

Minimum Description Length Principle for Maximum Entropy Model Selection

Model selection is central to statistics, and many learning problems can be formulated as model selection problems. In this paper, we treat the problem of selecting a maximum entropy model given various feature subsets and their moments, as…

Information Theory · Computer Science 2013-11-28 Gaurav Pandey , Ambedkar Dukkipati

Multi-Label Learning with Provable Guarantee

Here we study the problem of learning labels for large text corpora where each text can be assigned a variable number of labels. The problem might seem trivial when the label dimensionality is small and can be easily solved using a series…

Machine Learning · Computer Science 2016-11-02 Sayantan Dasgupta