Related papers: Learning the Information Divergence

The alpha-beta divergence for real and complex data

Divergences are fundamental to the information criteria that underpin most signal processing algorithms. The alpha-beta family of divergences, designed for non-negative data, offers a versatile framework that parameterizes and continuously…

Machine Learning · Computer Science 2026-03-27 Sergio Cruces

Adversarial $\alpha$-divergence Minimization for Bayesian Approximate Inference

Neural networks are popular state-of-the-art models for many different tasks.They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good…

Machine Learning · Statistics 2020-12-29 Simón Rodríguez Santana , Daniel Hernández-Lobato

Computing Divergences between Discrete Decomposable Models

There are many applications that benefit from computing the exact divergence between 2 discrete probability measures, including machine learning. Unfortunately, in the absence of any assumptions on the structure or independencies within…

Machine Learning · Computer Science 2023-10-16 Loong Kuan Lee , Nico Piatkowski , François Petitjean , Geoffrey I. Webb

Ensemble Estimation of Information Divergence

Recent work has focused on the problem of nonparametric estimation of information divergence functionals. Many existing approaches are restrictive in their assumptions on the density support set or require difficult calculations at the…

Information Theory · Computer Science 2021-07-30 Kevin R. Moon , Kumar Sricharan , Kristjan Greenewald , Alfred O. Hero

Learnability for the Information Bottleneck

The Information Bottleneck (IB) method (\cite{tishby2000information}) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective $I(X;Z)-\beta I(Y;Z)$ employs a…

Machine Learning · Computer Science 2019-10-23 Tailin Wu , Ian Fischer , Isaac L. Chuang , Max Tegmark

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify…

Information Theory · Computer Science 2022-10-13 Vudtiwat Ngampruetikorn , David J. Schwab

Alpha/Beta Divergences and Tweedie Models

We describe the underlying probabilistic interpretation of alpha and beta divergences. We first show that beta divergences are inherently tied to Tweedie distributions, a particular type of exponential family, known as exponential…

Machine Learning · Statistics 2012-09-20 Y. Kenan Yilmaz , A. Taylan Cemgil

Reliable Uncertainties for Bayesian Neural Networks using Alpha-divergences

Bayesian Neural Networks (BNNs) often result uncalibrated after training, usually tending towards overconfidence. Devising effective calibration methods with low impact in terms of computational complexity is thus of central interest. In…

Machine Learning · Computer Science 2020-08-18 Hector J. Hortua , Luigi Malago , Riccardo Volpi

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the…

Machine Learning · Statistics 2016-11-15 Bo Tang , Steven Kay , Haibo He

Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification

Selective classification is a powerful tool for automated decision-making in high-risk scenarios, allowing classifiers to act only when confident and abstain when uncertainty is high. Given a target accuracy, our goal is to minimize…

Statistics Theory · Mathematics 2025-10-28 Mohamed Ndaoud , Peter Radchenko , Bradley Rava

Meta learning of bounds on the Bayes classifier error

Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given…

Machine Learning · Computer Science 2016-03-11 Kevin R. Moon , Veronique Delouille , Alfred O. Hero

Learning unbiased features

A key element in transfer learning is representation learning; if representations can be developed that expose the relevant factors underlying the data, then new tasks and domains can be learned readily based on mappings of these salient…

Machine Learning · Computer Science 2014-12-18 Yujia Li , Kevin Swersky , Richard Zemel

Optimal Data Reduction under Information-Theoretic Criteria

Selecting an optimal subset of features or instances under an information theoretic criterion has become an effective preprocessing strategy for reducing data complexity while preserving essential information. This study investigates two…

Optimization and Control · Mathematics 2025-08-25 Taotao He , Jun Luo , Junkai Zhao

Theoretical bounds on estimation error for meta-learning

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…

Machine Learning · Statistics 2020-10-15 James Lucas , Mengye Ren , Irene Kameni , Toniann Pitassi , Richard Zemel

Learning Not to Learn: Training Deep Neural Networks with Biased Data

We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. Since a neural network efficiently learns data distribution, a network is likely to learn the bias information to…

Computer Vision and Pattern Recognition · Computer Science 2019-04-16 Byungju Kim , Hyunwoo Kim , Kyungsu Kim , Sungjin Kim , Junmo Kim

Reconciling meta-learning and continual learning with online mixtures of tasks

Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably…

Machine Learning · Computer Science 2019-06-20 Ghassen Jerfel , Erin Grant , Thomas L. Griffiths , Katherine Heller

Fast Convergence Rates for Distributed Non-Bayesian Learning

We consider the problem of distributed learning, where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes. We propose a…

Optimization and Control · Mathematics 2017-04-12 Angelia Nedić , Alex Olshevsky , César A. Uribe

Variational Information Maximization for Feature Selection

Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class…

Machine Learning · Statistics 2016-06-10 Shuyang Gao , Greg Ver Steeg , Aram Galstyan

Information Acquisition with $\alpha$-Divergence Costs

Building on the $f$-information model of Bloedel et al. (2025), this paper introduces a one-parameter family of information acquisition models and characterizes optimal information acquisition. This family extends the mutual information…

Theoretical Economics · Economics 2026-05-29 Takashi Ui

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification…

Information Theory · Computer Science 2015-02-11 Visar Berisha , Alan Wisler , Alfred O. Hero , Andreas Spanias