English
Related papers

Related papers: Approximation Algorithms for Restless Bandit Probl…

200 papers

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are $N$ arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate $K \geq 1$ arms at each time in order…

Optimization and Control · Mathematics 2011-12-25 Wenhan Dai , Yi Gai , Bhaskar Krishnamachari , Qing Zhao

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are $N$ arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A player seeks to activate $K \geq 1$ arms at each time in order…

Optimization and Control · Mathematics 2010-11-23 Wenhan Dai , Yi Gai , Bhaskar Krishnamachari , Qing Zhao

The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the…

Machine Learning · Computer Science 2023-06-13 Bo Li , Chi Ho Yeung

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards…

Machine Learning · Computer Science 2020-12-16 Siwei Wang , Haoyun Wang , Longbo Huang

A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the…

Machine Learning · Computer Science 2021-03-02 Orestis Papadigenopoulos , Constantine Caramanis

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu

We consider the scheduling problem concerning N projects. Each project evolves as a multi-state Markov process. At each time instant, one project is scheduled to work, and some reward depending on the state of the chosen project is…

Optimization and Control · Mathematics 2016-02-02 Kehao Wang

The stochastic multi-armed bandit (MAB) problem is a common model for sequential decision problems. In the standard setup, a decision maker has to choose at every instant between several competing arms, each of them provides a scalar random…

Machine Learning · Statistics 2021-10-27 Asaf Cassel , Shie Mannor , Assaf Zeevi

We study the multi-armed bandit problem with arms which are Markov chains with rewards. In the finite-horizon setting, the celebrated Gittins indices do not apply, and the exact solution is intractable. We provide approximation algorithms…

Data Structures and Algorithms · Computer Science 2016-09-14 Will Ma

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards,…

Machine Learning · Computer Science 2021-06-07 Tal Lancewicki , Shahar Segal , Tomer Koren , Yishay Mansour

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested…

Machine Learning · Computer Science 2022-12-08 Alberto Maria Metelli , Francesco Trovò , Matteo Pirola , Marcello Restelli

In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to…

Machine Learning · Computer Science 2011-04-29 Quan Liu , Kehao Wang , Lin Chen

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspired from real world applications which we call bandits with "stochastic delayed composite anonymous feedback (SDCAF)". In SDCAF, the rewards on pulling arms are…

Machine Learning · Computer Science 2019-10-14 Siddhant Garg , Aditya Kumar Akash

Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed…

Machine Learning · Computer Science 2024-05-03 Guojun Xiong , Jian Li

We consider a restless multi-armed bandit (RMAB) in which there are two types of arms, say A and B. Each arm can be in one of two states, say $0$ or $1.$ Playing a type A arm brings it to state $0$ with probability one and not playing it…

Systems and Control · Computer Science 2017-04-11 Rahul Meshram , Aditya Gopalan , D. Manjunath

In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting…

Machine Learning · Computer Science 2022-03-25 Youming Tao , Yulian Wu , Peng Zhao , Di Wang

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy. We…

Optimization and Control · Mathematics 2018-05-04 Hyeong Soo Chang

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai
‹ Prev 1 2 3 10 Next ›