Related papers: Learning from a Biased Sample

Statistical Learning from Biased Training Samples

With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may…

Machine Learning · Statistics 2022-11-02 Stephan Clémençon , Pierre Laforgue

Policy Learning under Biased Sample Selection

Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed…

Econometrics · Economics 2023-04-25 Lihua Lei , Roshni Sahoo , Stefan Wager

Learning from Survey Training Samples: Rate Bounds for Horvitz-Thompson Risk Minimizers

The generalization ability of minimizers of the empirical risk in the context of binary classification has been investigated under a wide variety of complexity assumptions for the collection of classifiers over which optimization is…

Statistics Theory · Mathematics 2019-01-21 Clémençon Stephan , Patrice Bertail , Guillaume Papa

The Risks of Invariant Risk Minimization

Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain…

Machine Learning · Computer Science 2021-03-30 Elan Rosenfeld , Pradeep Ravikumar , Andrej Risteski

Conditional Risk Minimization for Stochastic Processes

We study the task of learning from non-i.i.d. data. In particular, we aim at learning predictors that minimize the conditional risk for a stochastic process, i.e. the expected loss of the predictor on the next point conditioned on the set…

Machine Learning · Statistics 2016-03-15 Alexander Zimin , Christoph H. Lampert

Robust Generalization despite Distribution Shift via Minimum Discriminating Information

Training models that perform well under distribution shifts is a central challenge in machine learning. In this paper, we introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the…

Machine Learning · Computer Science 2021-10-28 Tobias Sutter , Andreas Krause , Daniel Kuhn

Joint empirical risk minimization for instance-dependent positive-unlabeled data

Learning from positive and unlabeled data (PU learning) is actively researched machine learning task. The goal is to train a binary classification model based on a training dataset containing part of positives which are labeled, and…

Machine Learning · Statistics 2023-12-29 Wojciech Rejchel , Paweł Teisseyre , Jan Mielniczuk

Learning from Positive and Unlabeled Data under the Selected At Random Assumption

For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about…

Machine Learning · Computer Science 2018-08-28 Jessa Bekker , Jesse Davis

Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization

Training machine learning and statistical models often involves optimizing a data-driven risk criterion. The risk is usually computed with respect to the empirical data distribution, but this may result in poor and unstable out-of-sample…

Machine Learning · Statistics 2024-11-11 Nicola Bariletto , Nhat Ho

Supervising Feature Influence

Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier…

Machine Learning · Computer Science 2018-04-10 Shayak Sen , Piotr Mardziel , Anupam Datta , Matthew Fredrikson

Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data

Empirical risk minimization is the main tool for prediction problems, but its extension to relational data remains unsolved. We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational…

Machine Learning · Statistics 2019-02-25 Victor Veitch , Morgane Austern , Wenda Zhou , David M. Blei , Peter Orbanz

Robust estimation of risks from small samples

Data-driven risk analysis involves the inference of probability distributions from measured or simulated data. In the case of a highly reliable system, such as the electricity grid, the amount of relevant data is often exceedingly limited,…

Methodology · Statistics 2017-07-11 Simon H. Tindemans , Goran Strbac

Efficient learning with robust gradient descent

Minimizing the empirical risk is a popular training strategy, but for learning tasks where the data may be noisy or heavy-tailed, one may require many observations in order to generalize well. To achieve better performance under less…

Machine Learning · Statistics 2018-10-16 Matthew J. Holland , Kazushi Ikeda

On Learning the Optimal Regularization Parameter in Inverse Problems

Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require…

Statistics Theory · Mathematics 2025-10-22 Jonathan Chirinos Rodriguez , Ernesto De Vito , Cesare Molinari , Lorenzo Rosasco , Silvia Villa

Robust Learning in Heterogeneous Contexts

We consider the problem of learning from training data obtained in different contexts, where the underlying context distribution is unknown and is estimated empirically. We develop a robust method that takes into account the uncertainty of…

Machine Learning · Statistics 2022-02-18 Muhammad Osama , Dave Zachariah , Petre Stoica

Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach

In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions…

Machine Learning · Statistics 2023-02-23 Yunzhe Zhou , Zhengling Qi , Chengchun Shi , Lexin Li

The Illusion of Learning from Observational Data: An Empirical Bayes Perspective

Randomized experiments have long been the gold standard for scientists seeking to learn about cause and effect. When randomized experiments are infeasible, scientists often resort to observational studies, which are widely available and…

Methodology · Statistics 2026-04-13 Bohan Wu , Sebastian Salazar , Donald P. Green , David M. Blei

Learning Optimal Features via Partial Invariance

Learning models that are robust to distribution shifts is a key concern in the context of their real-life applicability. Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.…

Machine Learning · Computer Science 2023-04-04 Moulik Choraria , Ibtihal Ferwana , Ankur Mani , Lav R. Varshney

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can…

Machine Learning · Computer Science 2019-07-01 Jessa Bekker , Pieter Robberechts , Jesse Davis

An information-theoretic learning model based on importance sampling

A crucial assumption underlying the most current theory of machine learning is that the training distribution is identical to the test distribution. However, this assumption may not hold in some real-world applications. In this paper, we…

Machine Learning · Statistics 2023-02-24 Jiangshe Zhang , Lizhen Ji , Fei Gao , Mengyao Li