English
Related papers

Related papers: Statistical Learning from Biased Training Samples

200 papers

The empirical risk minimization approach to data-driven decision making requires access to training data drawn under the same conditions as those that will be faced when the decision rule is deployed. However, in a number of settings, we…

Methodology · Statistics 2025-09-17 Roshni Sahoo , Lihua Lei , Stefan Wager

We study the task of learning from non-i.i.d. data. In particular, we aim at learning predictors that minimize the conditional risk for a stochastic process, i.e. the expected loss of the predictor on the next point conditioned on the set…

Machine Learning · Statistics 2016-03-15 Alexander Zimin , Christoph H. Lampert

The generalization ability of minimizers of the empirical risk in the context of binary classification has been investigated under a wide variety of complexity assumptions for the collection of classifiers over which optimization is…

Statistics Theory · Mathematics 2019-01-21 Clémençon Stephan , Patrice Bertail , Guillaume Papa

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution)…

Machine Learning · Statistics 2020-02-20 Robin Vogel , Mastane Achab , Stéphan Clémençon , Charles Tillier

Empirical risk minimization stands behind most optimization in supervised machine learning. Under this scheme, labeled data is used to approximate an expected cost (risk), and a learning algorithm updates model-defining parameters in search…

Machine Learning · Statistics 2023-05-25 James Schmidt

We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification)…

Machine Learning · Computer Science 2018-05-22 Steve Hanneke , Liu Yang

Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…

Machine Learning · Computer Science 2024-08-12 Pietro Morerio , Ruggero Ragonesi , Vittorio Murino

Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between…

Computer Vision and Pattern Recognition · Computer Science 2020-11-23 Ehsan Adeli , Qingyu Zhao , Adolf Pfefferbaum , Edith V. Sullivan , Li Fei-Fei , Juan Carlos Niebles , Kilian M. Pohl

Empirical risk minimization is a standard principle for choosing algorithms in learning theory. In this paper we study the properties of empirical risk minimization for time series. The analysis is carried out in a general framework that…

Machine Learning · Statistics 2021-08-12 Christian Brownlees , Jordi Llorens-Terrazas

We consider a general statistical learning problem where an unknown fraction of the training data is corrupted. We develop a robust learning method that only requires specifying an upper bound on the corrupted data fraction. The method…

Machine Learning · Statistics 2020-02-10 Muhammad Osama , Dave Zachariah , Peter Stoica

Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require…

Statistics Theory · Mathematics 2025-10-22 Jonathan Chirinos Rodriguez , Ernesto De Vito , Cesare Molinari , Lorenzo Rosasco , Silvia Villa

We propose a novel regularization algorithm to train deep neural networks, in which data at training time is severely biased. Since a neural network efficiently learns data distribution, a network is likely to learn the bias information to…

Computer Vision and Pattern Recognition · Computer Science 2019-04-16 Byungju Kim , Hyunwoo Kim , Kyungsu Kim , Sungjin Kim , Junmo Kim

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the…

Machine Learning · Computer Science 2026-02-09 Lorne Applebaum , Robert Busa-Fekete , August Y. Chen , Claudio Gentile , Tomer Koren , Aryan Mokhtari

Artificial intelligence models trained from data can only be as good as the underlying data is. Biases in training data propagating through to the output of a machine learning model are a well-documented and well-understood phenomenon, but…

Machine Learning · Computer Science 2025-04-02 Stefan Rass , Martin Dallinger

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for…

A wide range of machine learning algorithms iteratively add data to the training sample. Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization. We embed this kind of data addition into…

Machine Learning · Statistics 2024-06-25 Julian Rodemann

Machine learning models have traditionally been developed under the assumption that the training and test distributions match exactly. However, recent success in few-shot learning and related problems are encouraging signs that these models…

Machine Learning · Statistics 2020-10-15 James Lucas , Mengye Ren , Irene Kameni , Toniann Pitassi , Richard Zemel

We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the…

Machine Learning · Computer Science 2019-03-26 Giulia Denevi , Carlo Ciliberto , Riccardo Grazzi , Massimiliano Pontil

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can…

Machine Learning · Computer Science 2019-07-01 Jessa Bekker , Pieter Robberechts , Jesse Davis

We tackle the problem of bias mitigation of algorithmic decisions in a setting where both the output of the algorithm and the sensitive variable are continuous. Most of prior work deals with discrete sensitive variables, meaning that the…

‹ Prev 1 2 3 10 Next ›