Related papers: On Upper-Confidence Bound Policies for Non-Station…

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying and the total variation in the expected rewards is subject to a variation…

Machine Learning · Computer Science 2021-01-25 Lai Wei , Vaibhav Srivastava

On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

We study the multi-player stochastic multiarmed bandit (MAB) problem in an abruptly changing environment. We consider a collision model in which a player receives reward at an arm if it is the only player to select the arm. We design two…

Machine Learning · Statistics 2018-12-14 Lai Wei , Vaibhav Srivastava

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…

Machine Learning · Computer Science 2021-10-27 Anand Kalvit , Assaf Zeevi

Discrepancy-Based Algorithms for Non-Stationary Rested Bandits

We study the multi-armed bandit problem where the rewards are realizations of general non-stationary stochastic processes, a setting that generalizes many existing lines of work and analyses. In particular, we present a theoretical analysis…

Machine Learning · Computer Science 2020-09-04 Corinna Cortes , Giulia DeSalvo , Vitaly Kuznetsov , Mehryar Mohri , Scott Yang

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…

Machine Learning · Statistics 2019-01-25 Yang Cao , Zheng Wen , Branislav Kveton , Yao Xie

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to…

Machine Learning · Computer Science 2023-08-16 Marco Heyden , Vadim Arzamasov , Edouard Fouché , Klemens Böhm

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's…

Machine Learning · Computer Science 2019-06-11 Omar Besbes , Yonatan Gur , Assaf Zeevi

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations…

Machine Learning · Computer Science 2015-07-20 Aleksandr Vorobev , Gleb Gusev

UCB Algorithm for Exponential Distributions

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) problems. A machine learning paradigm popular within Cognitive Network related topics (e.g., Spectrum Sensing and Allocation). We focus on the case where the rewards…

Machine Learning · Statistics 2012-04-10 Wassim Jouini , Christophe Moy

Regional Multi-Armed Bandits

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…

Machine Learning · Computer Science 2018-02-23 Zhiyang Wang , Ruida Zhou , Cong Shen

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence…

Machine Learning · Statistics 2018-04-25 Lai Wei , Vaibhav Srivastava

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

We study the corrupted bandit problem, i.e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature. To be specific, the reward…

Machine Learning · Computer Science 2023-03-22 Debabrota Basu , Odalric-Ambrym Maillard , Timothée Mathieu

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested…

Machine Learning · Statistics 2024-11-28 Marco Fiandri , Alberto Maria Metelli , Francesco Trov`o

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are…

Machine Learning · Computer Science 2023-10-19 Mengfan Xu , Diego Klabjan

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm…

Machine Learning · Statistics 2018-05-16 Xue Lu , Niall Adams , Nikolas Kantas

Influential Bandits: Pulling an Arm May Change the Environment

While classical formulations of multi-armed bandit problems assume that each arm's reward is independent and stationary, real-world applications often involve non-stationary environments and interdependencies between arms. In particular,…

Machine Learning · Computer Science 2025-06-19 Ryoma Sato , Shinji Ito

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either…

Machine Learning · Computer Science 2014-05-21 Richard Combes , Alexandre Proutiere

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown…

Machine Learning · Computer Science 2023-06-12 Xiaotong Cheng , Setareh Maghsudi

Multi-armed Bandit Problem with Known Trend

We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated…

Machine Learning · Computer Science 2017-05-15 Djallel Bouneffouf , Raphaël Feraud