English
Related papers

Related papers: Pure Exploration for Multi-Armed Bandit Problems

200 papers

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…

Machine Learning · Computer Science 2015-05-19 Alexandra Carpentier , Michal Valko

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off…

Machine Learning · Computer Science 2026-05-14 Mohammad Pedramfar , Siamak Ravanbakhsh

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

There are two variants of the classical multi-armed bandit (MAB) problem that have received considerable attention from machine learning researchers in recent years: contextual bandits and simple regret minimization. Contextual bandits are…

Machine Learning · Statistics 2020-02-27 Aniket Anand Deshmukh , Srinagesh Sharma , James W. Cutler , Mark Moldwin , Clayton Scott

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's…

Machine Learning · Computer Science 2019-06-11 Omar Besbes , Yonatan Gur , Assaf Zeevi

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at…

Machine Learning · Computer Science 2020-02-13 Shuang Liu , Cheng Chen , Zhihua Zhang

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new…

Machine Learning · Computer Science 2012-11-06 Sébastien Bubeck , Nicolò Cesa-Bianchi

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent…

Machine Learning · Computer Science 2020-01-03 Nicolò Cesa-Bianchi , Ohad Shamir

We consider stochastic multi-armed bandit problems with complex actions over a set of basic arms, where the decision maker plays a complex action rather than a basic arm in each round. The reward of the complex action is some function of…

Machine Learning · Statistics 2013-11-05 Aditya Gopalan , Shie Mannor , Yishay Mansour

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them…

Machine Learning · Computer Science 2015-11-04 Gergely Neu

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is…

Machine Learning · Computer Science 2020-03-03 Semih Cayci , Atilla Eryilmaz , R. Srikant

We study the stochastic multi-armed bandit problem when one knows the value $\mu^{(\star)}$ of an optimal arm, as a well as a positive lower bound on the smallest positive gap $\Delta$. We propose a new randomized policy that attains a…

Statistics Theory · Mathematics 2013-02-13 Sébastien Bubeck , Vianney Perchet , Philippe Rigollet

We introduce and study a new class of stochastic bandit problems, referred to as predictive bandits. In each round, the decision maker first decides whether to gather information about the rewards of particular arms (so that their rewards…

Machine Learning · Computer Science 2020-04-03 Simon Lindståhl , Alexandre Proutiere , Andreas Johnsson

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by…

Statistics Theory · Mathematics 2016-11-15 Aurélien Garivier , Emilie Kaufmann , Tor Lattimore

Optimal regret bounds for Multi-Armed Bandit problems are now well documented. They can be classified into two categories based on the growth rate with respect to the time horizon $T$: (i) small, distribution-dependent, bounds of order of…

Data Structures and Algorithms · Computer Science 2017-04-12 Arthur Flajolet , Patrick Jaillet

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…

Machine Learning · Computer Science 2023-01-05 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala
‹ Prev 1 2 3 10 Next ›