Related papers: Two-Armed Restless Bandits with Imperfect Informat…

Multi-armed Bandits with Constrained Arms and Hidden States

The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural…

Systems and Control · Computer Science 2017-10-20 Varun Mehta , Rahul Meshram , Kesav Kaza , S. N. Merchant

Markovian restless bandits and index policies: A review

The restless multi-armed bandit problem is a paradigmatic modeling framework for optimal dynamic priority allocation in stochastic models of wide-ranging applications that has been widely investigated and applied since its inception in a…

General Mathematics · Mathematics 2026-01-26 José Niño-Mora

Two families of indexable partially observable restless bandits and Whittle index computation

We consider the restless bandits with general state space under partial observability with two observational models: first, the state of each bandit is not observable at all, and second, the state of each bandit is observable only if it is…

Systems and Control · Electrical Eng. & Systems 2023-05-25 Nima Akbarzadeh , Aditya Mahajan

Risk-Aware Decision Making in Restless Bandits: Theory and Algorithms for Planning and Learning

In restless bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with each arm being a Markov decision process. In this work, we generalize the traditional restless bandits problem…

Machine Learning · Computer Science 2026-02-20 Nima Akbarzadeh , Yossiri Adulyasak , Erick Delage

Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits

Restless multi-armed bandits with partially observable states has applications in communication systems, age of information and recommendation systems. In this paper, we study multi-state partially observable restless bandit models. We…

Machine Learning · Computer Science 2021-08-03 Rahul Meshram , Kesav Kaza

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested…

Machine Learning · Computer Science 2022-12-08 Alberto Maria Metelli , Francesco Trovò , Matteo Pirola , Marcello Restelli

Replicable Bandits

In this paper, we introduce the notion of replicable policies in the context of stochastic bandits, one of the canonical problems in interactive learning. A policy in the bandit environment is called replicable if it pulls, with high…

Machine Learning · Computer Science 2023-02-16 Hossein Esfandiari , Alkis Kalavasis , Amin Karbasi , Andreas Krause , Vahab Mirrokni , Grigoris Velegkas

Asymptotically optimal priority policies for indexable and nonindexable restless bandits

We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is a controllable stochastic process whose state evolution depends on whether or not the bandit is made active. Since finding the optimal control is…

Probability · Mathematics 2016-09-05 I. M. Verloop

On the Whittle Index for Restless Multi-armed Hidden Markov Bandits

We consider a restless multi-armed bandit in which each arm can be in one of two states. When an arm is sampled, the state of the arm is not available to the sampler. Instead, a binary signal with a known randomness that depends on the…

Systems and Control · Computer Science 2017-12-20 Rahul Meshram , D. Manjunath , Aditya Gopalan

Gittins' theorem under uncertainty

We study dynamic allocation problems for discrete time multi-armed bandits under uncertainty, based on the the theory of nonlinear expectations. We show that, under strong independence of the bandits and with some relaxation in the…

Optimization and Control · Mathematics 2021-06-16 Samuel N. Cohen , Tanut Treetanthiploet

Best-Arm Identification in Unimodal Bandits

We study the fixed-confidence best-arm identification problem in unimodal bandits, in which the means of the arms increase with the index of the arm up to their maximum, then decrease. We derive two lower bounds on the stopping time of any…

Machine Learning · Computer Science 2025-05-27 Riccardo Poiani , Marc Jourdan , Emilie Kaufmann , Rémy Degenne

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Restless Bandits with Constrained Arms: Applications in Social and Information Networks

We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information…

Systems and Control · Computer Science 2018-01-22 Varun Mehta , Rahul Meshram , Kesav Kaza , S. N. Merchant

Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not…

Machine Learning · Computer Science 2023-05-02 Vishesh Mittal , Rahul Meshram , Deepak Dev , Surya Prakash

On the Complexity of Best Arm Identification in Multi-Armed Bandit Models

The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret minimization is now well known, our aim is…

Machine Learning · Statistics 2016-11-15 Emilie Kaufmann , Olivier Cappé , Aurélien Garivier

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both…

Machine Learning · Computer Science 2022-03-25 Guojun Xiong , Jian Li , Rahul Singh

Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting

Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the…

Machine Learning · Statistics 2024-09-11 Gianmarco Genalti , Marco Mussi , Nicola Gatti , Marcello Restelli , Matteo Castiglioni , Alberto Maria Metelli

Stochastic Bandits with Delay-Dependent Payoffs

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…

Machine Learning · Statistics 2020-02-20 Leonardo Cella , Nicolò Cesa-Bianchi

Optimal Data Driven Resource Allocation under Multi-Armed Bandit Observations

This paper introduces the first asymptotically optimal strategy for a multi armed bandit (MAB) model under side constraints. The side constraints model situations in which bandit activations are limited by the availability of certain…

Machine Learning · Statistics 2025-02-10 Apostolos N. Burnetas , Odysseas Kanavetas , Michael N. Katehakis

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu