English
Related papers

Related papers: On ergodic two-armed bandits

200 papers

One of two independent stochastic processes (arms) are to be selected at each of n stages. The selection is sequential and depends on past observations as well as the prior information. Observations from arm i are independent given a…

Statistics Theory · Mathematics 2011-01-26 Yaming Yu

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean…

Machine Learning · Computer Science 2023-05-30 Pinyan Lu , Chao Tao , Xiaojin Zhang

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…

Machine Learning · Statistics 2020-02-20 Leonardo Cella , Nicolò Cesa-Bianchi

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed,…

Machine Learning · Statistics 2017-08-02 Shahin Shahrampour , Mohammad Noshad , Vahid Tarokh

The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures…

Machine Learning · Computer Science 2022-10-13 Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai

For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an…

Optimization and Control · Mathematics 2020-07-30 Hyeong Soo Chang

We introduce the safe linear stochastic bandit framework---a generalization of linear stochastic bandits---where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe)…

Machine Learning · Statistics 2019-11-22 Kia Khezeli , Eilyan Bitar

We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+\nu)$ of radius $1+\nu$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+\nu) \rightarrow \mathbb{R}$ are considered to…

Machine Learning · Statistics 2017-05-31 Hemant Tyagi , Sebastian Stich , Bernd Gärtner

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-\delta$, within $\varepsilon$…

Machine Learning · Computer Science 2023-06-06 Xiao-Yue Gong , Mark Sellke

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the…

Machine Learning · Computer Science 2020-05-26 Andrea Tirinzoni , Alessandro Lazaric , Marcello Restelli

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…

Machine Learning · Computer Science 2026-05-22 Avrim Blum , Marten Garicano , Kavya Ravichandran , Dravyansh Sharma

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected…

Machine Learning · Computer Science 2015-03-09 Richard Combes , Alexandre Proutiere

The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of $K$ arms of a slot machine, without any foreknowledge of their payouts, except that they are…

Data Structures and Algorithms · Computer Science 2007-05-23 Varsha Dani , Thomas P. Hayes

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

In this paper, we consider the problem of multi-armed bandits with a large, possibly infinite number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success…

Machine Learning · Computer Science 2011-11-21 Chong Jiang , R. Srikant

We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of Successive Elimination…

Artificial Intelligence · Computer Science 2016-09-09 Robin Allesiardo , Raphaël Féraud , Odalric-Ambrym Maillard

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely…

Machine Learning · Statistics 2017-10-17 Liyuan Xu , Junya Honda , Masashi Sugiyama
‹ Prev 1 2 3 10 Next ›