Related papers: An information-theoretic learning model based on i…
A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in most real-world scenarios. In this paper, we propose an information…
Given a task of predicting $Y$ from $X$, a loss function $L$, and a set of probability distributions $\Gamma$ on $(X,Y)$, what is the optimal decision rule minimizing the worst-case expected loss over $\Gamma$? In this paper, we address…
Empirical risk minimization stands behind most optimization in supervised machine learning. Under this scheme, labeled data is used to approximate an expected cost (risk), and a learning algorithm updates model-defining parameters in search…
The empirical risk minimization approach to data-driven decision making requires access to training data drawn under the same conditions as those that will be faced when the decision rule is deployed. However, in a number of settings, we…
Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution)…
This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to…
How can we effectively remove or ''unlearn'' undesirable information, such as specific features or the influence of individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce…
Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we…
Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class…
The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce…
The concept of a minimax classifier is well-established in statistical decision theory, but its implementation via neural networks remains challenging, particularly in scenarios with imbalanced training data having a limited number of…
Driven by applications in telecommunication networks, we explore the simulation task of estimating rare event probabilities for tandem queues in their steady state. Existing literature has recognized that importance sampling methods can be…
Many machine learning models appear to deploy effortlessly under distribution shift, and perform well on a target distribution that is considerably different from the training distribution. Yet, learning theory of distribution shift bounds…
Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both…
We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top…
A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being…
We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform…
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set $\mathcal{G}$ up to the smallest possible additive term, called the convergence rate. When the…
In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the…