Related papers: UCB Algorithm for Exponential Distributions
Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…
We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises…
We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their…
We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…
The multi-armed bandit (MAB) problems are widely studied in fields of operations research, stochastic optimization, and reinforcement learning. In this paper, we consider the classical MAB model with heavy-tailed reward distributions and…
We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from $K$ arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to…
One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…
We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…
Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…
We study a structured multi-agent multi-armed bandit (MAMAB) problem in a dynamic environment. A graph reflects the information-sharing structure among agents, and the arms' reward distributions are piecewise-stationary with several unknown…
Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm…
For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards. The simple case includes only two agents sharing the spectrum which is fully studied…
Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal…
The classical multi-armed bandit (MAB) problem involves a learner and a collection of K independent arms, each with its own ex ante unknown independent reward distribution. At each one of a finite number of rounds, the learner selects one…
We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying and the total variation in the expected rewards is subject to a variation…
Multi-armed bandit (MAB) is a widely adopted framework for sequential decision-making under uncertainty. Traditional bandit algorithms rely solely on online data, which tends to be scarce as it must be gathered during the online phase when…
The stochastic multi-armed bandit (MAB) problem is one of the most fundamental models in sequential decision-making, with the core challenge being the trade-off between exploration and exploitation. Although algorithms such as Upper…
Sequential decision-making under uncertainty often involves multiple agents learning which actions (arms) yield the highest rewards through repeated interaction with a stochastic environment. This setting is commonly modeled by cooperative…
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm…