English
Related papers

Related papers: A Sampling-Based Method for Gittins Index Approxim…

200 papers

Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide…

Computation · Statistics 2023-01-04 James K. He , Sofía S. Villar , Lida Mavrogonatou

This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms. I consider a Gaussian multi-armed bandit problem with discount factor $\gamma$. The Gittins index…

Machine Learning · Computer Science 2019-04-10 Daniel Russo

Adaptive designs for multi-armed clinical trials have become increasingly popular recently in many areas of medical research because of their potential to shorten development times and to increase patient response. However, developing…

Applications · Statistics 2017-03-16 Adam Smith , Sofia S. Villar

Gittins indices provide an optimal solution to the classical multi-armed bandit problem. An obstacle to their use has been the common perception that their computation is very difficult. This paper demonstrates an accessible general…

Machine Learning · Statistics 2019-09-12 James Edwards

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those…

Machine Learning · Computer Science 2016-05-31 Tor Lattimore

This paper considers the efficient exact computation of the counterpart of the Gittins index for a finite-horizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected…

Optimization and Control · Mathematics 2022-07-29 José Niño-Mora

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms…

Data Structures and Algorithms · Computer Science 2016-04-12 Ashish Goel , Sanjeev Khanna , Brad Null

We consider the Gittins index for a normal distribution with unknown mean $\theta$ and known variance where $\theta$ has a normal prior. In addition to presenting some monotonicity properties of the Gittins index, we derive an approximation…

Statistics Theory · Mathematics 2007-06-13 Yi-Ching Yao

Many stochastic optimization algorithms work by estimating the gradient of the cost function on the fly by sampling datapoints uniformly at random from a training set. However, the estimator might have a large variance, which inadvertently…

Machine Learning · Computer Science 2017-08-10 Farnood Salehi , L. Elisa Celis , Patrick Thiran

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an…

Machine Learning · Computer Science 2020-01-09 Daniel Russo , Benjamin Van Roy

We study the multi-armed bandit problem with arms which are Markov chains with rewards. In the finite-horizon setting, the celebrated Gittins indices do not apply, and the exact solution is intractable. We provide approximation algorithms…

Data Structures and Algorithms · Computer Science 2016-09-14 Will Ma

This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. Leveraging ideas from the theory of suprema of empirical processes, we provide an algorithm whose…

Machine Learning · Computer Science 2020-06-23 Julian Katz-Samuels , Lalit Jain , Zohar Karnin , Kevin Jamieson

The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora's box model.…

Optimization and Control · Mathematics 2025-08-05 Ziv Scully , Alexander Terenin

This paper proposes a general framework of multi-armed bandit (MAB) processes by introducing a type of restrictions on the switches among arms evolving in continuous time. The Gittins index process is constructed for any single arm subject…

Probability · Mathematics 2021-12-28 Wenqing Bao , Xiaoqiang Cai , Xianyi Wu

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy, called Exploration-Biased Sampling, is not only asymptotically optimal: it is to the best…

Statistics Theory · Mathematics 2022-03-08 Antoine Barrier , Aurélien Garivier , Tomáš Kocák

Bayesian optimization through Gaussian process regression is an effective method of optimizing an unknown function for which every measurement is expensive. It approximates the objective function and then recommends a new measurement point…

Machine Learning · Statistics 2017-05-17 Hildo Bijl , Thomas B. Schön , Jan-Willem van Wingerden , Michel Verhaegen

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from…

Machine Learning · Computer Science 2021-06-17 Long Yang , Zhao Li , Zehong Hu , Shasha Ruan , Shijian Li , Gang Pan , Hongyang Chen

In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To…

Machine Learning · Statistics 2024-07-23 Pierre Clavier , Tom Huix , Alain Durmus

Multi-arm bandit experimental designs are increasingly being adopted over standard randomized trials due to their potential to improve outcomes for study participants, enable faster identification of the best-performing options, and/or…

Methodology · Statistics 2025-06-04 Brian M Cho , Aurélien Bibaut , Nathan Kallus

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the…

Machine Learning · Statistics 2017-08-01 Vishnu Raj , Sheetal Kalyani
‹ Prev 1 2 3 10 Next ›