Related papers: Multiarmed Bandit Problems with Delayed Feedback

Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

Machine Learning · Computer Science 2018-03-30 Aditya Grover , Todor Markov , Peter Attia , Norman Jin , Nicholas Perkins , Bryan Cheong , Michael Chen , Zi Yang , Stephen Harris , William Chueh , Stefano Ermon

Budgeted Recommendation with Delayed Feedback

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in…

Machine Learning · Computer Science 2024-05-21 Kweiguu Liu , Setareh Maghsudi

Biased Dueling Bandits with Stochastic Delayed Feedback

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

Unknown Delay for Adversarial Bandit Setting with Multiple Play

This paper addresses the problem of unknown delays in adversarial multi-armed bandit (MAB) with multiple play. Existing work on similar game setting focused on only the case where the learner selects an arm in each round. However, there are…

Machine Learning · Computer Science 2020-10-02 Olusola T. Odeyomi

Adversarial Bandits with Multi-User Delayed Feedback: Theory and Application

The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an…

Machine Learning · Computer Science 2024-02-13 Yandi Li , Jianxiong Guo , Yupeng Li , Tian Wang , Weijia Jia

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization…

Machine Learning · Statistics 2019-09-06 Sakshi Arya , Yuhong Yang

Bandit Learning with Delayed Impact of Actions

We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the past impact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real…

Machine Learning · Computer Science 2021-11-02 Wei Tang , Chien-Ju Ho , Yang Liu

A Bandit Learning Method for Continuous Games under Feedback Delays with Residual Pseudo-Gradient Estimate

Learning in multi-player games can model a large variety of practical scenarios, where each player seeks to optimize its own local objective function, which at the same time relies on the actions taken by others. Motivated by the frequent…

Optimization and Control · Mathematics 2023-09-08 Yuanhanqing Huang , Jianghai Hu

Bandit Online Learning with Unknown Delays

This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function…

Machine Learning · Computer Science 2019-05-29 Bingcong Li , Tianyi Chen , Georgios B. Giannakis

Adaptive Exploration for Latent-State Bandits

The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action…

Machine Learning · Computer Science 2026-02-19 Jikai Jin , Kenneth Hung , Sanath Kumar Krishnamurthy , Baoyi Shi , Congshan Zhang

Linear Bandits with Stochastic Delayed Feedback

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by…

Machine Learning · Statistics 2020-03-03 Claire Vernade , Alexandra Carpentier , Tor Lattimore , Giovanni Zappella , Beyza Ermis , Michael Brueckner

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and…

Artificial Intelligence · Computer Science 2025-11-17 Mohammadsina Almasi , Hadis Anahideh

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm…

Machine Learning · Computer Science 2023-07-28 Jianjun Yuan , Wei Lee Woon , Ludovik Coba

Bandits with Delayed, Aggregated Anonymous Feedback

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately…

Machine Learning · Statistics 2018-06-14 Ciara Pike-Burke , Shipra Agrawal , Csaba Szepesvari , Steffen Grunewalder

Bayesian Optimization -- Multi-Armed Bandit Problem

In this report, we survey Bayesian Optimization methods focussed on the Multi-Armed Bandit Problem. We take the help of the paper "Portfolio Allocation for Bayesian Optimization". We report a small literature survey on the acquisition…

Machine Learning · Computer Science 2020-12-16 Abhilash Nandy , Chandan Kumar , Deepak Mewada , Soumya Sharma

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal…

Machine Learning · Computer Science 2019-12-02 N. Mert Vural , Hakan Gokcesu , Kaan Gokcesu , Suleyman S. Kozat

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu