Related papers: Pure Exploration for Multi-Armed Bandit Problems

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…

Machine Learning · Computer Science 2015-05-19 Alexandra Carpentier , Michal Valko

Multi-Armed Sampling Problem and the End of Exploration

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off…

Machine Learning · Computer Science 2026-05-14 Mohammad Pedramfar , Siamak Ravanbakhsh

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

Simple Regret Minimization for Contextual Bandits

There are two variants of the classical multi-armed bandit (MAB) problem that have received considerable attention from machine learning researchers in recent years: contextual bandits and simple regret minimization. Contextual bandits are…

Machine Learning · Statistics 2020-02-27 Aniket Anand Deshmukh , Srinagesh Sharma , James W. Cutler , Mark Moldwin , Clayton Scott

Regret Bounds for Batched Bandits

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's…

Machine Learning · Computer Science 2019-06-11 Omar Besbes , Yonatan Gur , Assaf Zeevi

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at…

Machine Learning · Computer Science 2020-02-13 Shuang Liu , Cheng Chen , Zhihua Zhang

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new…

Machine Learning · Computer Science 2012-11-06 Sébastien Bubeck , Nicolò Cesa-Bianchi

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Bandit Regret Scaling with the Effective Loss Range

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent…

Machine Learning · Computer Science 2020-01-03 Nicolò Cesa-Bianchi , Ohad Shamir

Thompson Sampling for Complex Bandit Problems

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of…

Machine Learning · Statistics 2013-11-05 Aditya Gopalan , Shie Mannor , Yishay Mansour

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them…

Machine Learning · Computer Science 2015-11-04 Gergely Neu

Budget-Constrained Bandits over General Cost and Reward Distributions

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is…

Machine Learning · Computer Science 2020-03-03 Semih Cayci , Atilla Eryilmaz , R. Srikant

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a…

Statistics Theory · Mathematics 2013-02-13 Sébastien Bubeck , Vianney Perchet , Philippe Rigollet

Predictive Bandits

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards…

Machine Learning · Computer Science 2020-04-03 Simon Lindståhl , Alexandre Proutiere , Andreas Johnsson

On Explore-Then-Commit Strategies

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by…

Statistics Theory · Mathematics 2016-11-15 Aurélien Garivier , Emilie Kaufmann , Tor Lattimore

Logarithmic regret bounds for Bandits with Knapsacks

Optimal regret bounds for Multi-Armed Bandit problems are now well documented. They can be classified into two categories based on the growth rate with respect to the time horizon $T$: (i) small, distribution-dependent, bounds of order of…

Data Structures and Algorithms · Computer Science 2017-04-12 Arthur Flajolet , Patrick Jaillet

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…

Machine Learning · Computer Science 2023-01-05 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala