Related papers: Multi-armed bandit problem with precedence relatio…

Max-Min Grouped Bandits

In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find the group whose worst arm has the highest mean reward. This…

Machine Learning · Statistics 2022-03-16 Zhenlin Wang , Jonathan Scarlett

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

The Multi-Armed Bandit, with Constraints

The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis…

Optimization and Control · Mathematics 2012-03-22 Eric V. Denardo , Eugene A. Feinberg , Uriel G. Rothblum

Multi-Armed Bandits with Minimum Aggregated Revenue Constraints

We examine a multi-armed bandit problem with contextual information, where the objective is to ensure that each arm receives a minimum aggregated reward across contexts while simultaneously maximizing the total cumulative reward. This…

Machine Learning · Computer Science 2025-10-15 Ahmed Ben Yahmed , Hafedh El Ferchichi , Marc Abeille , Vianney Perchet

Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms

In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold $\tau $. We…

Machine Learning · Computer Science 2025-09-03 Chanakya Varude , Jay Chaudhary , Siddharth Kaushik , Prasanna Chaporkar

On Regret with Multiple Best Arms

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and…

Machine Learning · Statistics 2020-10-23 Yinglun Zhu , Robert Nowak

Optimal strategies for a class of sequential control problems with precedence relations

Consider the following multi-phase project management problem. Each project is divided into several phases. All projects enter the next phase at the same point chosen by the decision maker based on observations up to that point. Within each…

Statistics Theory · Mathematics 2007-06-13 Hock Peng Chan , Cheng-Der Fuh , Inchi Hu

Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…

Machine Learning · Computer Science 2026-05-22 Avrim Blum , Marten Garicano , Kavya Ravichandran , Dravyansh Sharma

Contextual Blocking Bandits

We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms' mean rewards. However, playing an arm blocks it (across all contexts) for a…

Machine Learning · Computer Science 2020-06-18 Soumya Basu , Orestis Papadigenopoulos , Constantine Caramanis , Sanjay Shakkottai

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…

Machine Learning · Computer Science 2020-03-26 P Sharoff , Nishant A. Mehta , Ravi Ganti

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Bandits with many optimal arms

We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal…

Machine Learning · Computer Science 2021-11-08 Rianne de Heide , James Cheshire , Pierre Ménard , Alexandra Carpentier

Phase Transitions in Bandits with Switching Constraints

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and…

Machine Learning · Computer Science 2021-03-22 David Simchi-Levi , Yunzong Xu

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal…

Machine Learning · Computer Science 2019-12-02 N. Mert Vural , Hakan Gokcesu , Kaan Gokcesu , Suleyman S. Kozat

Infinite Arms Bandit: Optimality via Confidence Bounds

Berry et al. (1997) initiated the development of the infinite arms bandit problem. They derived a regret lower bound of all allocation strategies for Bernoulli rewards with uniform priors, and proposed strategies based on success runs.…

Machine Learning · Statistics 2020-06-23 Hock Peng Chan , Shouri Hu

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence, etc. The model is composed of $M$ arms and $K$ plays. Each arm has a stochastic…

Artificial Intelligence · Computer Science 2025-12-29 Hong Xie , Haoran Gu , Yanying Huang , Tao Tan , Defu Lian

Conservative Bandits

We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time.…

Machine Learning · Statistics 2016-02-16 Yifan Wu , Roshan Shariff , Tor Lattimore , Csaba Szepesvári

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

We consider the query recommendation problem in closed loop interactive learning settings like online information gathering and exploratory analytics. The problem can be naturally modelled using the Multi-Armed Bandits (MAB) framework with…

Machine Learning · Computer Science 2024-03-29 Shameem A. Puthiya Parambath , Christos Anagnostopoulos , Roderick Murray-Smith , Sean MacAvaney , Evangelos Zervas