English
Related papers

Related papers: The Ratio Index for Budgeted Learning, with Applic…

200 papers

This paper considers the efficient exact computation of the counterpart of the Gittins index for a finite-horizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected…

Optimization and Control · Mathematics 2022-07-29 José Niño-Mora

A sampling-based method is introduced to approximate the Gittins index for a general family of alternative bandit processes. The approximation consists of a truncation of the optimization horizon and support for the immediate rewards, an…

Optimization and Control · Mathematics 2023-07-24 Stef Baas , Richard J. Boucherie , Aleida Braaksma

The Gittins index is a tool that optimally solves a variety of decision-making problems involving uncertainty, including multi-armed bandit problems, minimizing mean latency in queues, and search problems like the Pandora's box model.…

Optimization and Control · Mathematics 2025-08-05 Ziv Scully , Alexander Terenin

The dynamic allocation problem, also known as the `multi-armed bandit' problem, simulates a situation in which an agent is faced with a tradeoff between actions that yield an immediate reward and actions whose benefits can only be perceived…

Probability · Mathematics 2026-02-03 Christopher Wang

We study the multi-armed bandit problem with arms which are Markov chains with rewards. In the finite-horizon setting, the celebrated Gittins indices do not apply, and the exact solution is intractable. We provide approximation algorithms…

Data Structures and Algorithms · Computer Science 2016-09-14 Will Ma

Restless bandits are an important class of problems with applications in recommender systems, active learning, revenue management and other areas. We consider infinite-horizon discounted restless bandits with many arms where a fixed…

Machine Learning · Computer Science 2022-03-31 Xiangyu Zhang , Peter I. Frazier

Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide…

Computation · Statistics 2023-01-04 James K. He , Sofía S. Villar , Lida Mavrogonatou

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those…

Machine Learning · Computer Science 2016-05-31 Tor Lattimore

We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special…

Machine Learning · Computer Science 2018-10-30 Julian Zimmert , Yevgeny Seldin

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision…

Optimization and Control · Mathematics 2026-01-22 José Niño-Mora

A sensing policy for the restless multi-armed bandit problem with stationary but unknown reward distributions is proposed. The work is presented in the context of cognitive radios in which the bandit problem arises when deciding which parts…

Information Theory · Computer Science 2012-11-20 Jan Oksanen , Visa Koivunen , H. Vincent Poor

In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as…

Machine Learning · Computer Science 2018-12-04 Jun Geng , Lifeng Lai

In this paper, we consider several finite-horizon Bayesian multi-armed bandit problems with side constraints which are computationally intractable (NP-Hard) and for which no optimal (or near optimal) algorithms are known to exist with…

Data Structures and Algorithms · Computer Science 2013-07-18 Sudipto Guha , Kamesh Munagala

In this paper, we introduce the notion of replicable policies in the context of stochastic bandits, one of the canonical problems in interactive learning. A policy in the bandit environment is called replicable if it pulls, with high…

Machine Learning · Computer Science 2023-02-16 Hossein Esfandiari , Alkis Kalavasis , Amin Karbasi , Andreas Krause , Vahab Mirrokni , Grigoris Velegkas

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized…

Machine Learning · Computer Science 2020-10-09 Yu-Heng Hung , Ping-Chun Hsieh , Xi Liu , P. R. Kumar

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in…

Optimization and Control · Mathematics 2017-03-22 Roland Fryer , Philipp Harms

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified.…

Machine Learning · Computer Science 2020-12-15 Vineet Nair , Vishakha Patil , Gaurav Sinha

This paper is about index policies for minimizing (frequentist) regret in a stochastic multi-armed bandit model, inspired by a Bayesian view on the problem. Our main contribution is to prove that the Bayes-UCB algorithm, which relies on…

Machine Learning · Statistics 2017-11-07 Emilie Kaufmann

We address the intractable multi-armed bandit problem with switching costs, for which Asawa and Teneketzis introduced in [M. Asawa and D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automat. Control, 41…

Optimization and Control · Mathematics 2023-04-05 José Niño-Mora

We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not…

Machine Learning · Computer Science 2023-05-02 Vishesh Mittal , Rahul Meshram , Deepak Dev , Surya Prakash
‹ Prev 1 2 3 10 Next ›