Related papers: Learning the Information Divergence
Divergences are fundamental to the information criteria that underpin most signal processing algorithms. The alpha-beta family of divergences, designed for non-negative data, offers a versatile framework that parameterizes and continuously…
Neural networks are popular state-of-the-art models for many different tasks.They are often trained via back-propagation to find a value of the weights that correctly predicts the observed data. Although back-propagation has shown good…
There are many applications that benefit from computing the exact divergence between 2 discrete probability measures, including machine learning. Unfortunately, in the absence of any assumptions on the structure or independencies within…
Recent work has focused on the problem of nonparametric estimation of information divergence functionals. Many existing approaches are restrictive in their assumptions on the density support set or require difficult calculations at the…
The Information Bottleneck (IB) method (\cite{tishby2000information}) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective $I(X;Z)-\beta I(Y;Z)$ employs a…
Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify…
We describe the underlying probabilistic interpretation of alpha and beta divergences. We first show that beta divergences are inherently tied to Tweedie distributions, a particular type of exponential family, known as exponential…
Bayesian Neural Networks (BNNs) often result uncalibrated after training, usually tending towards overconfidence. Devising effective calibration methods with low impact in terms of computational complexity is thus of central interest. In…
Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the…
Selective classification is a powerful tool for automated decision-making in high-risk scenarios, allowing classifiers to act only when confident and abstain when uncertainty is high. Given a target accuracy, our goal is to minimize…
Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given…
A key element in transfer learning is representation learning; if representations can be developed that expose the relevant factors underlying the data, then new tasks and domains can be learned readily based on mappings of these salient…
Selecting an optimal subset of features or instances under an information theoretic criterion has become an effective preprocessing strategy for reducing data complexity while preserving essential information. This study investigates two…
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…
We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. Since a neural network efficiently learns data distribution, a network is likely to learn the bias information to…
Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not advantageous, for instance, when tasks are considerably…
We consider the problem of distributed learning, where a network of agents collectively aim to agree on a hypothesis that best explains a set of distributed observations of conditionally independent random processes. We propose a…
Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class…
Building on the $f$-information model of Bloedel et al. (2025), this paper introduces a one-parameter family of information acquisition models and characterizes optimal information acquisition. This family extends the mutual information…
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification…