Related papers: Statistical Learning from Biased Training Samples

Learning from a Biased Sample

The empirical risk minimization approach to data-driven decision making requires access to training data drawn under the same conditions as those that will be faced when the decision rule is deployed. However, in a number of settings, we…

Methodology · Statistics 2025-09-17 Roshni Sahoo , Lihua Lei , Stefan Wager

Conditional Risk Minimization for Stochastic Processes

We study the task of learning from non-i.i.d. data. In particular, we aim at learning predictors that minimize the conditional risk for a stochastic process, i.e. the expected loss of the predictor on the next point conditioned on the set…

Machine Learning · Statistics 2016-03-15 Alexander Zimin , Christoph H. Lampert

Learning from Survey Training Samples: Rate Bounds for Horvitz-Thompson Risk Minimizers

The generalization ability of minimizers of the empirical risk in the context of binary classification has been investigated under a wide variety of complexity assumptions for the collection of classifiers over which optimization is…

Statistics Theory · Mathematics 2019-01-21 Clémençon Stephan , Patrice Bertail , Guillaume Papa

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution)…

Machine Learning · Statistics 2020-02-20 Robin Vogel , Mastane Achab , Stéphan Clémençon , Charles Tillier

Taylor Learning

Empirical risk minimization stands behind most optimization in supervised machine learning. Under this scheme, labeled data is used to approximate an expected cost (risk), and a learning algorithm updates model-defining parameters in search…

Machine Learning · Statistics 2023-05-25 James Schmidt

Statistical Learning under Nonstationary Mixing Processes

We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification)…

Machine Learning · Computer Science 2018-05-22 Steve Hanneke , Liu Yang

Model Debiasing by Learnable Data Augmentation

Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…

Machine Learning · Computer Science 2024-08-12 Pietro Morerio , Ruggero Ragonesi , Vittorio Murino

Representation Learning with Statistical Independence to Mitigate Bias

Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Ehsan Adeli , Qingyu Zhao , Adolf Pfefferbaum , Edith V. Sullivan , Li Fei-Fei , Juan Carlos Niebles , Kilian M. Pohl

Empirical Risk Minimization for Time Series: Nonparametric Performance Bounds for Prediction

Empirical risk minimization is a standard principle for choosing algorithms in learning theory. In this paper we study the properties of empirical risk minimization for time series. The analysis is carried out in a general framework that…

Machine Learning · Statistics 2021-08-12 Christian Brownlees , Jordi Llorens-Terrazas

Robust Risk Minimization for Statistical Learning

We consider a general statistical learning problem where an unknown fraction of the training data is corrupted. We develop a robust learning method that only requires specifying an upper bound on the corrupted data fraction. The method…

Machine Learning · Statistics 2020-02-10 Muhammad Osama , Dave Zachariah , Peter Stoica

On Learning the Optimal Regularization Parameter in Inverse Problems

Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require…

Statistics Theory · Mathematics 2025-10-22 Jonathan Chirinos Rodriguez , Ernesto De Vito , Cesare Molinari , Lorenzo Rosasco , Silvia Villa

Learning Not to Learn: Training Deep Neural Networks with Biased Data

We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. Since a neural network efficiently learns data distribution, a network is likely to learn the bias information to…

Computer Vision and Pattern Recognition · Computer Science 2019-04-16 Byungju Kim , Hyunwoo Kim , Kyungsu Kim , Sungjin Kim , Junmo Kim

Statistical Learning from Attribution Sets

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the…

Machine Learning · Computer Science 2026-02-09 Lorne Applebaum , Robert Busa-Fekete , August Y. Chen , Claudio Gentile , Tomer Koren , Aryan Mokhtari

Statistically Testing Training Data for Unwanted Error Patterns using Rule-Oriented Regression

Artificial intelligence models trained from data can only be as good as the underlying data is. Biases in training data propagating through to the output of a machine learning model are a well-documented and well-understood phenomenon, but…

Machine Learning · Computer Science 2025-04-02 Stefan Rass , Martin Dallinger

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for…

Machine Learning · Computer Science 2021-08-03 Loic Le Folgoc , Vasileios Baltatzis , Amir Alansary , Sujal Desai , Anand Devaraj , Sam Ellis , Octavio E. Martinez Manzanera , Fahdi Kanavati , Arjun Nair , Julia Schnabel , Ben Glocker

Towards Bayesian Data Selection

A wide range of machine learning algorithms iteratively add data to the training sample. Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization. We embed this kind of data addition into…

Machine Learning · Statistics 2024-06-25 Julian Rodemann

Theoretical bounds on estimation error for meta-learning

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…

Machine Learning · Statistics 2020-10-15 James Lucas , Mengye Ren , Irene Kameni , Toniann Pitassi , Richard Zemel

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can…

Machine Learning · Computer Science 2019-07-01 Jessa Bekker , Pieter Robberechts , Jesse Davis

Debiasing Machine Learning Models by Using Weakly Supervised Learning

We tackle the problem of bias mitigation of algorithmic decisions in a setting where both the output of the algorithm and the sensitive variable are continuous. Most of prior work deals with discrete sensitive variables, meaning that the…

Machine Learning · Computer Science 2024-02-26 Renan D. B. Brotto , Jean-Michel Loubes , Laurent Risser , Jean-Pierre Florens , Kenji Nose-Filho , João M. T. Romano