English
Related papers

Related papers: Multiarmed Bandit Problems with Delayed Feedback

200 papers

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in…

Machine Learning · Computer Science 2024-05-21 Kweiguu Liu , Setareh Maghsudi

The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information…

Machine Learning · Computer Science 2025-04-08 Bongsoo Yi , Yue Kang , Yao Li

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

This paper addresses the problem of unknown delays in adversarial multi-armed bandit (MAB) with multiple play. Existing work on similar game setting focused on only the case where the learner selects an arm in each round. However, there are…

Machine Learning · Computer Science 2020-10-02 Olusola T. Odeyomi

The multi-armed bandit (MAB) models have attracted significant research attention due to their applicability and effectiveness in various real-world scenarios such as resource allocation, online advertising, and dynamic pricing. As an…

Machine Learning · Computer Science 2024-02-13 Yandi Li , Jianxiong Guo , Yupeng Li , Tian Wang , Weijia Jia

We study a multi-armed bandit problem with covariates in a setting where there is a possible delay in observing the rewards. Under some mild assumptions on the probability distributions for the delays and using an appropriate randomization…

Machine Learning · Statistics 2019-09-06 Sakshi Arya , Yuhong Yang

We consider a stochastic multi-armed bandit (MAB) problem with delayed impact of actions. In our setting, actions taken in the past impact the arm rewards in the subsequent future. This delayed impact of actions is prevalent in the real…

Machine Learning · Computer Science 2021-11-02 Wei Tang , Chien-Ju Ho , Yang Liu

Learning in multi-player games can model a large variety of practical scenarios, where each player seeks to optimize its own local objective function, which at the same time relies on the actions taken by others. Motivated by the frequent…

Optimization and Control · Mathematics 2023-09-08 Yuanhanqing Huang , Jianghai Hu

This paper deals with bandit online learning problems involving feedback of unknown delay that can emerge in multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function…

Machine Learning · Computer Science 2019-05-29 Bingcong Li , Tianyi Chen , Georgios B. Giannakis

The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action…

Machine Learning · Computer Science 2026-02-19 Jikai Jin , Kenneth Hung , Sanath Kumar Krishnamurthy , Baoyi Shi , Congshan Zhang

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation. One of the main challenges faced by…

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for delayed outcomes, hidden heterogeneity, and…

Artificial Intelligence · Computer Science 2025-11-17 Mohammadsina Almasi , Hadis Anahideh

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm…

Machine Learning · Computer Science 2023-07-28 Jianjun Yuan , Wei Lee Woon , Ludovik Coba

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback". In this problem, when the player pulls an arm, a reward is generated, however it is not immediately…

Machine Learning · Statistics 2018-06-14 Ciara Pike-Burke , Shipra Agrawal , Csaba Szepesvari , Steffen Grunewalder

In this report, we survey Bayesian Optimization methods focussed on the Multi-Armed Bandit Problem. We take the help of the paper "Portfolio Allocation for Bayesian Optimization". We report a small literature survey on the acquisition…

Machine Learning · Computer Science 2020-12-16 Abhilash Nandy , Chandan Kumar , Deepak Mewada , Soumya Sharma

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal…

Machine Learning · Computer Science 2019-12-02 N. Mert Vural , Hakan Gokcesu , Kaan Gokcesu , Suleyman S. Kozat

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu
‹ Prev 1 2 3 10 Next ›