Related papers: Inference with the Upper Confidence Bound Algorith…
Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be…
Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to…
Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal…
Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be…
In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates…
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…
The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning. In this study, we propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by…
A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received…
Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen…
We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…
We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…
Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the…
We consider a novel multi-armed bandit framework where the rewards obtained by pulling the arms are functions of a common latent random variable. The correlation between arms due to the common random source can be used to design a…
This paper studies a decentralized homogeneous multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward…
Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…
The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) often…
Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…
In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…
The upper confidence bound (UCB) policy is recognized as an order-optimal solution for the classical total-reward bandit problem. While similar UCB-based approaches have been applied to the max bandit problem, which aims to maximize the…