Related papers: Debiasing Samples from Online Learning Using Boots…

Bootstrapping Upper Confidence Bound

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

Look Beyond Bias with Entropic Adversarial Data Augmentation

Deep neural networks do not discriminate between spurious and causal patterns, and will only learn the most predictive ones while ignoring the others. This shortcut learning behaviour is detrimental to a network's ability to generalize to…

Machine Learning · Computer Science 2023-01-11 Thomas Duboudin , Emmanuel Dellandréa , Corentin Abgrall , Gilles Hénaff , Liming Chen

Online Bandits with (Biased) Offline Data: Adaptive Learning under Distribution Mismatch

Traditional online learning models are typically initialized from scratch. By contrast, contemporary real-world applications often have access to historical datasets that can potentially enhanced the online learning processes. We study how…

Machine Learning · Computer Science 2025-12-19 Wang Chi Cheung , Lixing Lyu

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not…

Machine Learning · Computer Science 2017-05-01 Daniel Russo , David Tse , Benjamin Van Roy

Delay-Adaptive Learning in Generalized Linear Contextual Bandits

In this paper, we consider online learning in generalized linear contextual bandits where rewards are not immediately observed. Instead, rewards are available to the decision-maker only after some delay, which is unknown and stochastic. We…

Machine Learning · Computer Science 2020-03-12 Jose Blanchet , Renyuan Xu , Zhengyuan Zhou

AutoDebias: Learning to Debias for Recommendation

Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the…

Machine Learning · Computer Science 2021-10-29 Jiawei Chen , Hande Dong , Yang Qiu , Xiangnan He , Xin Xin , Liang Chen , Guli Lin , Keping Yang

Jump-teaching: Combating Sample Selection Bias via Temporal Disagreement

Sample selection is a straightforward technique to combat noisy labels, aiming to prevent mislabeled samples from degrading the robustness of neural networks. However, existing methods mitigate compounding selection bias either by…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Kangye Ji , Fei Cheng , Zeqing Wang , Qichang Zhang , Bohu Huang

Online Multi-Armed Bandits with Adaptive Inference

During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected--thereby yielding…

Machine Learning · Computer Science 2021-06-29 Maria Dimakopoulou , Zhimei Ren , Zhengyuan Zhou

Unbiased Test Error Estimation in the Poisson Means Problem via Coupled Bootstrap Techniques

We propose a coupled bootstrap (CB) method for the test error of an arbitrary algorithm that estimates the mean in a Poisson sequence, often called the Poisson means problem. The idea behind our method is to generate two carefully-designed…

Methodology · Statistics 2024-08-20 Natalia L. Oliveira , Jing Lei , Ryan J. Tibshirani

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Machine learning models achieve state-of-the-art performance on many supervised learning tasks. However, prior evidence suggests that these models may learn to rely on shortcut biases or spurious correlations (intuitively, correlations that…

Machine Learning · Computer Science 2021-08-31 Sindhu C. M. Gowda , Shalmali Joshi , Haoran Zhang , Marzyeh Ghassemi

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to…

Machine Learning · Computer Science 2026-05-15 Bochao Li , Yao Fu , Wei Chen , Fang Kong

Position Bias Estimation for Unbiased Learning-to-Rank in eCommerce Search

The Unbiased Learning-to-Rank framework has been recently proposed as a general approach to systematically remove biases, such as position bias, from learning-to-rank models. The method takes two steps - estimating click propensities and…

Information Retrieval · Computer Science 2019-10-23 Grigor Aslanyan , Utkarsh Porwal

Denoising after Entropy-based Debiasing A Robust Training Method for Dataset Bias with Noisy Labels

Improperly constructed datasets can result in inaccurate inferences. For instance, models trained on biased datasets perform poorly in terms of generalization (i.e., dataset bias). Recent debiasing techniques have successfully achieved…

Machine Learning · Computer Science 2022-12-05 Sumyeong Ahn , Se-Young Yun

Residual Bootstrap Exploration for Stochastic Linear Bandit

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward…

Machine Learning · Statistics 2022-06-20 Shuang Wu , Chi-Hua Wang , Yuantong Li , Guang Cheng

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback

In most real-world recommender systems, the observed rating data are subject to selection bias, and the data are thus missing-not-at-random. Developing a method to facilitate the learning of a recommender with biased feedback is one of the…

Social and Information Networks · Computer Science 2022-06-16 Yuta Saito

Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling

Machine learning models are increasingly used to produce predictions that serve as input data in subsequent statistical analyses. For example, computer vision predictions of economic and environmental indicators based on satellite imagery…

Methodology · Statistics 2025-11-18 Dan M. Kluger , Kerri Lu , Tijana Zrnic , Sherrie Wang , Stephen Bates

Online Bootstrap Inference for the Trend of Nonstationary Time Series

This article proposes an online bootstrap scheme for nonparametric level estimation in nonstationary time series. Our approach applies to a broad class of level estimators expressible as weighted sample averages over time windows, including…

Methodology · Statistics 2026-03-02 Thomas Nagler , Tobias Brock , Nicolai Palm

Thompson Sampling with Diffusion Generative Prior

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs…

Machine Learning · Computer Science 2023-01-31 Yu-Guan Hsieh , Shiva Prasad Kasiviswanathan , Branislav Kveton , Patrick Blöbaum

Meta-Thompson Sampling

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns…

Machine Learning · Computer Science 2021-06-24 Branislav Kveton , Mikhail Konobeev , Manzil Zaheer , Chih-wei Hsu , Martin Mladenov , Craig Boutilier , Csaba Szepesvari