Related papers: An information-theoretic learning model based on i…

Information Theoretical Importance Sampling Clustering

A current assumption of most clustering methods is that the training data and future data are taken from the same distribution. However, this assumption may not hold in most real-world scenarios. In this paper, we propose an information…

Machine Learning · Statistics 2023-05-31 Jiangshe Zhang , Lizhen Ji , Meng Wang

A Minimax Approach to Supervised Learning

Given a task of predicting $Y$ from $X$, a loss function $L$, and a set of probability distributions $\Gamma$ on $(X,Y)$, what is the optimal decision rule minimizing the worst-case expected loss over $\Gamma$? In this paper, we address…

Machine Learning · Statistics 2017-07-05 Farzan Farnia , David Tse

Taylor Learning

Empirical risk minimization stands behind most optimization in supervised machine learning. Under this scheme, labeled data is used to approximate an expected cost (risk), and a learning algorithm updates model-defining parameters in search…

Machine Learning · Statistics 2023-05-25 James Schmidt

Learning from a Biased Sample

The empirical risk minimization approach to data-driven decision making requires access to training data drawn under the same conditions as those that will be faced when the decision rule is deployed. However, in a number of settings, we…

Methodology · Statistics 2025-09-17 Roshni Sahoo , Lihua Lei , Stefan Wager

Theoretical bounds on estimation error for meta-learning

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…

Machine Learning · Statistics 2020-10-15 James Lucas , Mengye Ren , Irene Kameni , Toniann Pitassi , Richard Zemel

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution)…

Machine Learning · Statistics 2020-02-20 Robin Vogel , Mastane Achab , Stéphan Clémençon , Charles Tillier

Compression-Based Regularization with an Application to Multi-Task Learning

This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to…

Machine Learning · Statistics 2018-11-14 Matías Vera , Leonardo Rey Vega , Pablo Piantanida

Machine Unlearning via Information Theoretic Regularization

How can we effectively remove or ''unlearn'' undesirable information, such as specific features or the influence of individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce…

Machine Learning · Computer Science 2025-12-30 Shizhou Xu , Thomas Strohmer

The Minimum Information Principle for Discriminative Learning

Exponential models of distributions are widely used in machine learning for classiffication and modelling. It is well known that they can be interpreted as maximum entropy models under empirical expectation constraints. In this work, we…

Machine Learning · Computer Science 2012-07-19 Amir Globerson , Naftali Tishby

Variational Information Maximization for Feature Selection

Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class…

Machine Learning · Statistics 2016-06-10 Shuyang Gao , Greg Ver Steeg , Aram Galstyan

Regularization via Mass Transportation

The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce…

Optimization and Control · Mathematics 2019-07-15 Soroosh Shafieezadeh-Abadeh , Daniel Kuhn , Peyman Mohajerin Esfahani

Deep Minimax Classifiers for Imbalanced Datasets with a Small Number of Minority Samples

The concept of a minimax classifier is well-established in statistical decision theory, but its implementation via neural networks remains challenging, particularly in scenarios with imbalanced training data having a limited number of…

Machine Learning · Computer Science 2026-01-07 Hansung Choi , Daewon Seo

When Machine Learning Meets Importance Sampling: A More Efficient Rare Event Estimation Approach

Driven by applications in telecommunication networks, we explore the simulation task of estimating rare event probabilities for tandem queues in their steady state. Existing literature has recognized that importance sampling methods can be…

Machine Learning · Computer Science 2025-04-22 Ruoning Zhao , Xinyun Chen

Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift

Many machine learning models appear to deploy effortlessly under distribution shift, and perform well on a target distribution that is considerably different from the training distribution. Yet, learning theory of distribution shift bounds…

Machine Learning · Computer Science 2024-05-30 Robi Bhattacharjee , Nick Rittler , Kamalika Chaudhuri

Importance Sampling and Necessary Sample Size: an Information Theory Approach

Importance sampling approximates expectations with respect to a target measure by using samples from a proposal measure. The performance of the method over large classes of test functions depends heavily on the closeness between both…

Computation · Statistics 2016-09-01 Daniel Sanz-Alonso

Invariant Risk Minimization

We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top…

Machine Learning · Statistics 2020-03-31 Martin Arjovsky , Léon Bottou , Ishaan Gulrajani , David Lopez-Paz

Importance Sampling Placement in Off-Policy Temporal-Difference Methods

A central challenge to applying many off-policy reinforcement learning algorithms to real world problems is the variance introduced by importance sampling. In off-policy learning, the agent learns about a different policy than the one being…

Machine Learning · Computer Science 2022-06-20 Eric Graves , Sina Ghiassian

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform…

Machine Learning · Statistics 2020-12-25 Yunbei Xu , Assaf Zeevi

Fast learning rates in statistical inference through aggregation

We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set $\mathcal{G}$ up to the smallest possible additive term, called the convergence rate. When the…

Statistics Theory · Mathematics 2009-09-09 Jean-Yves Audibert

Learning with Bad Training Data via Iterative Trimmed Loss Minimization

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the…

Machine Learning · Computer Science 2019-02-20 Yanyao Shen , Sujay Sanghavi