Related papers: UCB Algorithm for Exponential Distributions

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

On Distributed Cooperative Decision-Making in Multiarmed Bandits

We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises…

Systems and Control · Computer Science 2019-09-17 Peter Landgren , Vaibhav Srivastava , Naomi Ehrich Leonard

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their…

Optimization and Control · Mathematics 2020-08-13 Peter Landgren , Vaibhav Srivastava , Naomi Ehrich Leonard

Multi-Armed Bandit Problem and Batch UCB Rule

We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…

Statistics Theory · Mathematics 2019-02-04 Alexander Kolnogorov , Sergey Garbar

Extended UCB Policies for Multi-armed Bandit Problems

The multi-armed bandit (MAB) problems are widely studied in fields of operations research, stochastic optimization, and reinforcement learning. In this paper, we consider the classical MAB model with heavy-tailed reward distributions and…

Machine Learning · Computer Science 2025-09-16 Keqin Liu , Tianshuo Zheng , Zhi-Hua Zhou

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to…

Machine Learning · Computer Science 2023-08-16 Marco Heyden , Vadim Arzamasov , Edouard Fouché , Klemens Böhm

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…

Machine Learning · Computer Science 2021-10-27 Anand Kalvit , Assaf Zeevi

Regional Multi-Armed Bandits

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…

Machine Learning · Computer Science 2018-02-23 Zhiyang Wang , Ruida Zhou , Cong Shen

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…

Machine Learning · Statistics 2019-01-25 Yang Cao , Zheng Wen , Branislav Kveton , Yao Xie

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown…

Machine Learning · Computer Science 2023-06-12 Xiaotong Cheng , Setareh Maghsudi

Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…

Machine Learning · Computer Science 2024-06-11 Ambrus Tamás , Szabolcs Szentpéteri , Balázs Csanád Csáji

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm…

Machine Learning · Statistics 2018-05-16 Xue Lu , Niall Adams , Nikolas Kantas

Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing Strategy Design

For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards. The simple case includes only two agents sharing the spectrum which is fully studied…

Signal Processing · Electrical Eng. & Systems 2017-11-15 Jingyang Lu , Lun Li , Dan Shen , Genshe Chen , Bin Jia , Erik Blasch , Khanh Pham

Differentiable Linear Bandit Algorithm

Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal…

Machine Learning · Computer Science 2020-06-05 Kaige Yang , Laura Toni

Multiarmed Bandits Problem Under the Mean-Variance Setting

The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one…

Optimization and Control · Mathematics 2024-05-07 Hongda Hu , Arthur Charpentier , Mario Ghossoub , Alexander Schied

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying and the total variation in the expected rewards is subject to a variation…

Machine Learning · Computer Science 2021-01-25 Lai Wei , Vaibhav Srivastava

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards

Multi-armed bandit (MAB) is a widely adopted framework for sequential decision-making under uncertainty. Traditional bandit algorithms rely solely on online data, which tends to be scarce as it must be gathered during the online phase when…

Statistics Theory · Mathematics 2026-04-23 Wenlong Ji , Yihan Pan , Ruihao Zhu , Lihua Lei

A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off

The stochastic multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making, with the core challenge being the trade-off between exploration and exploitation. Although algorithms such as Upper…

Machine Learning · Computer Science 2025-10-13 Di Zhang

Cooperative Bandit Learning in Directed Networks with Arm-Access Constraints

Sequential decision-making under uncertainty often involves multiple agents learning which actions (arms) yield the highest rewards through repeated interaction with a stochastic environment. This setting is commonly modeled by cooperative…

Systems and Control · Electrical Eng. & Systems 2026-03-25 Evagoras Makridis , Themistoklis Charalambous

Multi-armed Bandit Learning on a Graph

The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm…

Machine Learning · Computer Science 2023-03-21 Tianpeng Zhang , Kasper Johansson , Na Li