Related papers: Differentiable Linear Bandit Algorithm
Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be…
Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…
Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen…
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance…
We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential…
We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…
Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…
In this paper, we discuss the asymptotic behavior of the Upper Confidence Bound (UCB) algorithm in the context of multiarmed bandit problems and discuss its implication in downstream inferential tasks. While inferential tasks become…
The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation.…
We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…
Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…
Motivated by wireless networks where interference or channel state estimates provide partial insight into throughput, we study a variant of the classical stochastic multi-armed bandit problem in which the learner has limited access to…
We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal randomness but…
The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning. In this study, we propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by…
By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose…
In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…
We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…
We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and…
This paper studies a decentralized homogeneous multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward…