Related papers: Sample Selection Bias Correction Theory

Correcting sampling biases via importance reweighting for spatial modeling

In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to…

Machine Learning · Computer Science 2023-09-15 Boris Prokhorov , Diana Koldasbayeva , Alexey Zaytsev

Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets…

Machine Learning · Computer Science 2021-08-30 Jing An , Lexing Ying , Yuhua Zhu

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting

A learned generative model often produces biased statistics relative to the underlying data distribution. A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio…

Machine Learning · Statistics 2019-11-05 Aditya Grover , Jiaming Song , Alekh Agarwal , Kenneth Tran , Ashish Kapoor , Eric Horvitz , Stefano Ermon

Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…

Machine Learning · Statistics 2026-04-21 Jonas Arruda , Sophie Chervet , Paula Staudt , Andreas Wieser , Michael Hoelscher , Isabelle Sermet-Gaudelus , Nadine Binder , Lulla Opatowski , Jan Hasenauer

Robust Fairness-aware Learning Under Sample Selection Bias

The underlying assumption of many machine learning algorithms is that the training data and test data are drawn from the same distributions. However, the assumption is often violated in real world due to the sample selection bias between…

Machine Learning · Computer Science 2021-05-26 Wei Du , Xintao Wu

Sampling Bias Correction for Supervised Machine Learning: A Bayesian Inference Approach with Practical Applications

Given a supervised machine learning problem where the training set has been subject to a known sampling bias, how can a model be trained to fit the original dataset? We achieve this through the Bayesian inference framework by altering the…

Machine Learning · Statistics 2022-03-16 Max Sklar

Stable Learning via Sample Reweighting

We consider the problem of learning linear prediction models with model misspecification bias. In such case, the collinearity among input variables may inflate the error of parameter estimation, resulting in instability of prediction…

Machine Learning · Computer Science 2019-12-02 Zheyan Shen , Peng Cui , Tong Zhang , Kun Kuang

Error analysis of regularized trigonometric linear regression with unbounded sampling: a statistical learning viewpoint

The effectiveness of non-parametric, kernel-based methods for function estimation comes at the price of high computational complexity, which hinders their applicability in adaptive, model-based control. Motivated by approximation techniques…

Statistics Theory · Mathematics 2023-03-17 Anna Scampicchio , Elena Arcari , Melanie N. Zeilinger

Evolved Sample Weights for Bias Mitigation: Effectiveness Depends on the Fairness Objective

Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting, which assigns a weight to each data point used during model training, can mitigate…

Machine Learning · Computer Science 2026-03-20 Anil K. Saini , Jose Guadalupe Hernandez , Emily F. Wong , Debanshi Misra , Tiffani J. Bright , Jason H. Moore

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for…

Machine Learning · Computer Science 2021-08-03 Loic Le Folgoc , Vasileios Baltatzis , Amir Alansary , Sujal Desai , Anand Devaraj , Sam Ellis , Octavio E. Martinez Manzanera , Fahdi Kanavati , Arjun Nair , Julia Schnabel , Ben Glocker

Bayesian Sampling Bias Correction: Training with the Right Loss Function

We derive a family of loss functions to train models in the presence of sampling bias. Examples are when the prevalence of a pathology differs from its sampling rate in the training dataset, or when a machine learning practioner rebalances…

Machine Learning · Computer Science 2020-06-25 L. Le Folgoc , V. Baltatzis , A. Alansary , S. Desai , A. Devaraj , S. Ellis , O. E. Martinez Manzanera , F. Kanavati , A. Nair , J. Schnabel , B. Glocker

Fast and Accurate Importance Weighting for Correcting Sample Bias

Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased…

Machine Learning · Computer Science 2022-09-12 Antoine de Mathelin , Francois Deheeger , Mathilde Mougeot , Nicolas Vayatis

Double machine learning for sample selection models

This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption…

Econometrics · Economics 2021-07-16 Michela Bia , Martin Huber , Lukáš Lafférs

Balance-Subsampled Stable Prediction

In machine learning, it is commonly assumed that training and test data share the same population distribution. However, this assumption is often violated in practice because the sample selection bias may induce the distribution shift from…

Machine Learning · Computer Science 2020-06-09 Kun Kuang , Hengtao Zhang , Fei Wu , Yueting Zhuang , Aijun Zhang

A Mathematical Foundation for Robust Machine Learning based on Bias-Variance Trade-off

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some…

Machine Learning · Computer Science 2021-11-23 Ou Wu , Weiyao Zhu , Yingjun Deng , Haixiang Zhang , Qinghu Hou

Bootstrap Bias Corrections for Ensemble Methods

This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We…

Machine Learning · Statistics 2015-06-02 Giles Hooker , Lucas Mentch

Learning Theory of Distributed Regression with Bias Corrected Regularization Kernel Network

Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the…

Machine Learning · Computer Science 2017-08-08 Zhengchu Guo , Lei Shi , Qiang Wu

A One-step Approach to Covariate Shift Adaptation

A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to non-stationarity of the…

Machine Learning · Computer Science 2021-05-04 Tianyi Zhang , Ikko Yamane , Nan Lu , Masashi Sugiyama

Learned reconstruction methods for inverse problems: sample error estimates

Learning-based and data-driven techniques have recently become a subject of primary interest in the field of reconstruction and regularization of inverse problems. Besides the development of novel methods, yielding excellent results in…

Machine Learning · Statistics 2023-12-22 Luca Ratti

Technical Note: Bias and the Quantification of Stability

Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney