Related papers: Regret Analysis of Stochastic and Nonstochastic Mu…

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's…

Machine Learning · Computer Science 2019-06-11 Omar Besbes , Yonatan Gur , Assaf Zeevi

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the…

Machine Learning · Computer Science 2023-06-13 Bo Li , Chi Ho Yeung

Bandits with Knapsacks

Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising.…

Data Structures and Algorithms · Computer Science 2017-09-06 Ashwinkumar Badanidiyuru , Robert Kleinberg , Aleksandrs Slivkins

On Explore-Then-Commit Strategies

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by…

Statistics Theory · Mathematics 2016-11-15 Aurélien Garivier , Emilie Kaufmann , Tor Lattimore

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model,…

Machine Learning · Statistics 2016-12-02 Naumaan Nayyar , Dileep Kalathil , Rahul Jain

An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit

We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem when the collision reward is unknown. Our results are based on…

Machine Learning · Computer Science 2022-10-04 Aldo Pacchiano , Peter Bartlett , Michael I. Jordan

Multi-Armed Sampling Problem and the End of Exploration

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off…

Machine Learning · Computer Science 2026-05-14 Mohammad Pedramfar , Siamak Ravanbakhsh

A Survey of Risk-Aware Multi-Armed Bandits

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial…

Machine Learning · Statistics 2022-05-13 Vincent Y. F. Tan , Prashanth L. A. , Krishna Jagannathan

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Recovering Bandits

We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played. While being a natural…

Machine Learning · Statistics 2019-11-01 Ciara Pike-Burke , Steffen Grünewälder

Forced Exploration in Bandit Problems

The multi-armed bandit(MAB) is a classical sequential decision problem. Most work requires assumptions about the reward distribution (e.g., bounded), while practitioners may have difficulty obtaining information about these distributions to…

Machine Learning · Computer Science 2023-12-14 Han Qi , Fei Guo , Li Zhu

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at…

Machine Learning · Computer Science 2020-02-13 Shuang Liu , Cheng Chen , Zhihua Zhang

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations…

Machine Learning · Computer Science 2015-07-20 Aleksandr Vorobev , Gleb Gusev

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Bandit problems with Levy processes

Bandit problems model the trade-off between exploration and exploitation in various decision problems. We study two-armed bandit problems in continuous time, where the risky arm can have two types: High or Low; both types yield stochastic…

Probability · Mathematics 2015-08-23 Asaf Cohen , Eilon Solan

Regret Lower Bounds in Multi-agent Multi-armed Bandit

Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in…

Machine Learning · Computer Science 2023-08-17 Mengfan Xu , Diego Klabjan

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment…

Machine Learning · Computer Science 2019-06-11 Hossein Aboutalebi , Doina Precup , Tibor Schuster

A Frequency-Domain Analysis of the Multi-Armed Bandit Problem: A New Perspective on the Exploration-Exploitation Trade-off

The stochastic multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making, with the core challenge being the trade-off between exploration and exploitation. Although algorithms such as Upper…

Machine Learning · Computer Science 2025-10-13 Di Zhang