Related papers: Information-Directed Selection for Top-Two Algorit…
While experimental design often focuses on selecting the single best alternative from a finite set (e.g., in ranking and selection or best-arm identification), many pure-exploration problems pursue richer goals. Given a specific goal,…
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two…
In the Best-$K$ identification problem (Best-$K$-Arm), we are given $N$ stochastic bandit arms with unknown reward distributions. Our goal is to identify the $K$ arms with the largest means with high confidence, by drawing samples from the…
This paper considers the optimal adaptive allocation of measurement effort for identifying the best among a finite set of options or designs. An experimenter sequentially chooses designs to measure and observes noisy signals of their…
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…
We consider the problem of the best arm identification in the presence of stochastic constraints, where there is a finite number of arms associated with multiple performance measures. The goal is to identify the arm that optimizes the…
We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm…
Top-$2$ methods have become popular in solving the best arm identification (BAI) problem. The best arm, or the arm with the largest mean amongst finitely many, is identified through an algorithm that at any sequential step independently…
In the Best-$k$-Arm problem, we are given $n$ stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the $k$ arms with the largest means by taking as few samples as possible. In this paper,…
We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection. The learner can only sample arms via certain exploration bundles, which we refer to as boxes. In…
The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving…
We study the fixed-confidence best-arm identification problem in unimodal bandits, in which the means of the arms increase with the index of the arm up to their maximum, then decrease. We derive two lower bounds on the stopping time of any…
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to…
Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one…
We study best-arm identification in stochastic multi-armed bandits under the fixed-confidence setting, focusing on instances with multiple optimal arms. Unlike prior work that addresses the unknown-number-of-optimal-arms case, we consider…
The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an {\epsilon}-good arm, best-arm identification, top-k arm identification, and…
In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been…
We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a budget constraint on the total amount of resources that can be used. Budgeted Thompson…
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected--thereby yielding…