English
Related papers

Related papers: Sample Selection Bias Correction Theory

200 papers

In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to…

Machine Learning · Computer Science 2023-09-15 Boris Prokhorov , Diana Koldasbayeva , Alexey Zaytsev

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets…

Machine Learning · Computer Science 2021-08-30 Jing An , Lexing Ying , Yuhua Zhu

A learned generative model often produces biased statistics relative to the underlying data distribution. A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio…

Machine Learning · Statistics 2019-11-05 Aditya Grover , Jiaming Song , Alekh Agarwal , Kenneth Tran , Ashish Kapoor , Eric Horvitz , Stefano Ermon

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…

The underlying assumption of many machine learning algorithms is that the training data and test data are drawn from the same distributions. However, the assumption is often violated in real world due to the sample selection bias between…

Machine Learning · Computer Science 2021-05-26 Wei Du , Xintao Wu

Given a supervised machine learning problem where the training set has been subject to a known sampling bias, how can a model be trained to fit the original dataset? We achieve this through the Bayesian inference framework by altering the…

Machine Learning · Statistics 2022-03-16 Max Sklar

We consider the problem of learning linear prediction models with model misspecification bias. In such case, the collinearity among input variables may inflate the error of parameter estimation, resulting in instability of prediction…

Machine Learning · Computer Science 2019-12-02 Zheyan Shen , Peng Cui , Tong Zhang , Kun Kuang

The effectiveness of non-parametric, kernel-based methods for function estimation comes at the price of high computational complexity, which hinders their applicability in adaptive, model-based control. Motivated by approximation techniques…

Statistics Theory · Mathematics 2023-03-17 Anna Scampicchio , Elena Arcari , Melanie N. Zeilinger

Machine learning models trained on real-world data may inadvertently make biased predictions that negatively impact marginalized communities. Reweighting, which assigns a weight to each data point used during model training, can mitigate…

Machine Learning · Computer Science 2026-03-20 Anil K. Saini , Jose Guadalupe Hernandez , Emily F. Wong , Debanshi Misra , Tiffani J. Bright , Jason H. Moore

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for…

We derive a family of loss functions to train models in the presence of sampling bias. Examples are when the prevalence of a pathology differs from its sampling rate in the training dataset, or when a machine learning practioner rebalances…

Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased…

Machine Learning · Computer Science 2022-09-12 Antoine de Mathelin , Francois Deheeger , Mathilde Mougeot , Nicolas Vayatis

This paper considers the evaluation of discretely distributed treatments when outcomes are only observed for a subpopulation due to sample selection or outcome attrition. For identification, we combine a selection-on-observables assumption…

Econometrics · Economics 2021-07-16 Michela Bia , Martin Huber , Lukáš Lafférs

In machine learning, it is commonly assumed that training and test data share the same population distribution. However, this assumption is often violated in practice because the sample selection bias may induce the distribution shift from…

Machine Learning · Computer Science 2020-06-09 Kun Kuang , Hengtao Zhang , Fei Wu , Yueting Zhuang , Aijun Zhang

A common assumption in machine learning is that samples are independently and identically distributed (i.i.d). However, the contributions of different samples are not identical in training. Some samples are difficult to learn and some…

Machine Learning · Computer Science 2021-11-23 Ou Wu , Weiyao Zhu , Yingjun Deng , Haixiang Zhang , Qinghu Hou

This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods. Accounting for bias is an important obstacle in recent efforts to develop statistical inference for machine learning methods. We…

Machine Learning · Statistics 2015-06-02 Giles Hooker , Lucas Mentch

Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the…

Machine Learning · Computer Science 2017-08-08 Zhengchu Guo , Lei Shi , Qiang Wu

A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. However, such an assumption is often violated in the real world due to non-stationarity of the…

Machine Learning · Computer Science 2021-05-04 Tianyi Zhang , Ikko Yamane , Nan Lu , Masashi Sugiyama

Learning-based and data-driven techniques have recently become a subject of primary interest in the field of reconstruction and regularization of inverse problems. Besides the development of novel methods, yielding excellent results in…

Machine Learning · Statistics 2023-12-22 Luca Ratti

Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is…

Machine Learning · Computer Science 2007-05-23 Peter D. Turney
‹ Prev 1 2 3 10 Next ›