Related papers: On ergodic two-armed bandits
One of two independent stochastic processes (arms) are to be selected at each of n stages. The selection is sequential and depends on past observations as well as the prior information. Observations from arm i are independent given a…
In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…
We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean…
Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…
We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…
We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed,…
The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures…
For the model of constrained multi-armed bandit, we show that by construction there exists an index-based deterministic asymptotically optimal algorithm. The optimality is achieved by the convergence of the probability of choosing an…
We introduce the safe linear stochastic bandit framework---a generalization of linear stochastic bandits---where, in each stage, the learner is required to select an arm with an expected reward that is no less than a predetermined (safe)…
We consider a stochastic continuum armed bandit problem where the arms are indexed by the $\ell_2$ ball $B_{d}(1+\nu)$ of radius $1+\nu$ in $\mathbb{R}^d$. The reward functions $r :B_{d}(1+\nu) \rightarrow \mathbb{R}$ are considered to…
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-\delta$, within $\varepsilon$…
We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the…
The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…
We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected…
The multi-armed bandit is a concise model for the problem of iterated decision-making under uncertainty. In each round, a gambler must pull one of $K$ arms of a slot machine, without any foreknowledge of their payouts, except that they are…
We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…
In this paper, we consider the problem of multi-armed bandits with a large, possibly infinite number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success…
We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are no longer assumed to be identically distributed. For the best-arm identification task, we introduce a version of Successive Elimination…
Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…
We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly. While existing methods partially or entirely…