Related papers: Non-stochastic Best Arm Identification and Hyperpa…
We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm…
We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal…
The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while…
We study best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, focusing on instances with multiple optimal arms. Unlike prior work that addresses the unknown-number-of-optimal-arms case, we consider…
This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and…
We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and…
In the Best-$K$ identification problem (Best-$K$-Arm), we are given $N$ stochastic bandit arms with unknown reward distributions. Our goal is to identify the $K$ arms with the largest means with high confidence, by drawing samples from the…
We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of…
This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability…
We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…
We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private)…
Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This innovative variant of the multi-armed bandit problem elegantly…
We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to…
We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their…
We study the problem of selecting $K$ arms with the highest expected rewards in a stochastic $n$-armed bandit game. This problem has a wide range of applications, e.g., A/B testing, crowdsourcing, simulation optimization. Our goal is to…
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is…
In multi-objective decision-making with hierarchical preferences, lexicographic bandits provide a natural framework for optimizing multiple objectives in a prioritized order. In this setting, a learner repeatedly selects arms and observes…
In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as…
We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the…