English
Related papers

Related papers: Smooth Sequential Optimisation with Delayed Feedba…

200 papers

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

We propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample…

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for…

Machine Learning · Computer Science 2023-04-12 Benjamin Howson , Ciara Pike-Burke , Sarah Filippi

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB…

Machine Learning · Computer Science 2019-07-31 Larkin Liu , Richard Downe , Joshua Reid

Non-stationary multi-armed bandits enable agents to adapt to changing environments by incorporating mechanisms to detect and respond to shifts in reward distributions, making them well-suited for dynamic settings. However, existing…

Machine Learning · Computer Science 2025-09-19 Shaoang Li , Jian Li

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu

In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the…

Data Structures and Algorithms · Computer Science 2015-03-17 Sudipto Guha , Kamesh Munagala , Martin Pal

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit…

Machine Learning · Computer Science 2026-02-12 Zhongxuan Liu , Yue Kang , Thomas C. M. Lee

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards…

Machine Learning · Computer Science 2020-12-16 Siwei Wang , Haoyun Wang , Longbo Huang

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected…

Machine Learning · Computer Science 2015-03-09 Richard Combes , Alexandre Proutiere

We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their…

Machine Learning · Computer Science 2022-03-09 Yasin Abbasi-Yadkori , Andras Gyorgy , Nevena Lazic

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the…

Machine Learning · Computer Science 2023-06-13 Bo Li , Chi Ho Yeung

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspired from real world applications which we call bandits with "stochastic delayed composite anonymous feedback (SDCAF)". In SDCAF, the rewards on pulling arms are…

Machine Learning · Computer Science 2019-10-14 Siddhant Garg , Aditya Kumar Akash

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm…

Machine Learning · Statistics 2018-05-16 Xue Lu , Niall Adams , Nikolas Kantas

Motivated by the fact that humans like some level of unpredictability or novelty, and might therefore get quickly bored when interacting with a stationary policy, we introduce a novel non-stationary bandit problem, where the expected reward…

Machine Learning · Computer Science 2022-03-08 Pierre Laforgue , Giulia Clerici , Nicolò Cesa-Bianchi , Ran Gilad-Bachrach

A recent work by Schlisselberg et al. (2024) studies a delay-as-payoff model for stochastic multi-armed bandits, where the payoff (either loss or reward) is delayed for a period that is proportional to the payoff itself. While this captures…

Machine Learning · Computer Science 2025-02-21 Mengxiao Zhang , Yingfei Wang , Haipeng Luo

Many stochastic optimization algorithms work by estimating the gradient of the cost function on the fly by sampling datapoints uniformly at random from a training set. However, the estimator might have a large variance, which inadvertently…

Machine Learning · Computer Science 2017-08-10 Farnood Salehi , L. Elisa Celis , Patrick Thiran

Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these…

Machine Learning · Computer Science 2022-06-29 Srivas Chennu , Andrew Maher , Jamie Martin , Subash Prabanantham

The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action…

Machine Learning · Computer Science 2026-02-19 Jikai Jin , Kenneth Hung , Sanath Kumar Krishnamurthy , Baoyi Shi , Congshan Zhang
‹ Prev 1 2 3 10 Next ›