Related papers: Approximation Algorithms for Restless Bandit Probl…

The Non-Bayesian Restless Multi-Armed Bandit: A Case of Near-Logarithmic Strict Regret

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are $N$ arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate $K \geq 1$ arms at each time in order…

Optimization and Control · Mathematics 2011-12-25 Wenhan Dai , Yi Gai , Bhaskar Krishnamachari , Qing Zhao

The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are $N$ arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate $K \geq 1$ arms at each time in order…

Optimization and Control · Mathematics 2010-11-23 Wenhan Dai , Yi Gai , Bhaskar Krishnamachari , Qing Zhao

Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the…

Machine Learning · Computer Science 2023-06-13 Bo Li , Chi Ho Yeung

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards…

Machine Learning · Computer Science 2020-12-16 Siwei Wang , Haoyun Wang , Longbo Huang

Recurrent Submodular Welfare and Matroid Blocking Bandits

A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the…

Machine Learning · Computer Science 2021-03-02 Orestis Papadigenopoulos , Constantine Caramanis

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

We consider the scheduling problem concerning N projects. Each project evolves as a multi-state Markov process. At each time instant, one project is scheduled to work, and some reward depending on the state of the chosen project is…

Optimization and Control · Mathematics 2016-02-02 Kehao Wang

A General Framework for Bandit Problems Beyond Cumulative Objectives

The stochastic multi-armed bandit (MAB) problem is a common model for sequential decision problems. In the standard setup, a decision maker has to choose at every instant between several competing arms, each of them provides a scalar random…

Machine Learning · Statistics 2021-10-27 Asaf Cassel , Shie Mannor , Assaf Zeevi

Improvements and Generalizations of Stochastic Knapsack and Multi-Armed Bandit Approximation Algorithms: Full Version

We study the multi-armed bandit problem with arms which are Markov chains with rewards. In the finite-horizon setting, the celebrated Gittins indices do not apply, and the exact solution is intractable. We provide approximation algorithms…

Data Structures and Algorithms · Computer Science 2016-09-14 Will Ma

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

Stochastic Rising Bandits

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested…

Machine Learning · Computer Science 2022-12-08 Alberto Maria Metelli , Francesco Trovò , Matteo Pirola , Marcello Restelli

On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to…

Machine Learning · Computer Science 2011-04-29 Quan Liu , Kehao Wang , Lin Chen

Stochastic Bandits with Delayed Composite Anonymous Feedback

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspired from real world applications which we call bandits with "stochastic delayed composite anonymous feedback (SDCAF)". In SDCAF, the rewards on pulling arms are…

Machine Learning · Computer Science 2019-10-14 Siddhant Garg , Aditya Kumar Akash

Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed…

Machine Learning · Computer Science 2024-05-03 Guojun Xiong , Jian Li

A Hidden Markov Restless Multi-armed Bandit Model for Playout Recommendation Systems

We consider a restless multi-armed bandit (RMAB) in which there are two types of arms, say A and B. Each arm can be in one of two states, say $0$ or $1.$ Playing a type A arm brings it to state $0$ with probability one and not playing it…

Systems and Control · Computer Science 2017-04-11 Rahul Meshram , Aditya Gopalan , D. Manjunath

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting…

Machine Learning · Computer Science 2022-03-25 Youming Tao , Yulian Wu , Peng Zhao , Di Wang

An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy. We…

Optimization and Control · Mathematics 2018-05-04 Hyeong Soo Chang

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai