English
Related papers

Related papers: Blocking Bandits

200 papers

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that…

Machine Learning · Computer Science 2024-03-21 Mohsen Bayati , Nima Hamidi , Ramesh Johari , Khashayar Khosravi

We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a…

Machine Learning · Computer Science 2020-06-18 Soumya Basu , Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai

We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal…

Machine Learning · Computer Science 2021-11-08 Rianne de Heide , James Cheshire , Pierre Ménard , Alexandra Carpentier

We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time step $t\in [T]$ is…

Machine Learning · Computer Science 2023-12-14 Fares Fourati , Christopher John Quinn , Mohamed-Slim Alouini , Vaneet Aggarwal

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an…

Machine Learning · Statistics 2019-09-13 Alexander Luedtke , Emilie Kaufmann , Antoine Chambaz

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…

Machine Learning · Computer Science 2020-03-26 P Sharoff , Nishant A. Mehta , Ravi Ganti

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…

Machine Learning · Statistics 2020-02-20 Leonardo Cella , Nicolò Cesa-Bianchi

A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player's actions and the reward distributions of the…

Machine Learning · Computer Science 2021-03-02 Orestis Papadigenopoulos , Constantine Caramanis

Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems. Most research for this problem focuses exclusively on the settings that players have \textit{full…

Machine Learning · Computer Science 2022-12-14 Guojun Xiong , Jian Li

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. No further assumption is made regarding the smoothness and the structure of the expected…

Machine Learning · Computer Science 2015-03-09 Richard Combes , Alexandre Proutiere

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we…

Machine Learning · Computer Science 2021-10-26 Yuko Kuroki , Liyuan Xu , Atsushi Miyauchi , Junya Honda , Masashi Sugiyama

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…

Machine Learning · Computer Science 2018-02-23 Zhiyang Wang , Ruida Zhou , Cong Shen

We study a multi-armed bandit problem in a dynamic environment where arm rewards evolve in a correlated fashion according to a Markov chain. Different than much of the work on related problems, in our formulation a learning algorithm does…

Machine Learning · Computer Science 2019-03-05 Tanner Fiez , Shreyas Sekar , Lillian J. Ratliff

Classical multi-armed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a risk-neutral metric. In many applications like finance, one is interested in balancing the…

Machine Learning · Computer Science 2019-06-04 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama
‹ Prev 1 2 3 10 Next ›