Related papers: Asymptotically Optimal Algorithms for Budgeted Mul…

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations…

Machine Learning · Computer Science 2015-07-20 Aleksandr Vorobev , Gleb Gusev

Thompson Sampling for Budgeted Multi-armed Bandits

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by…

Machine Learning · Computer Science 2015-05-04 Yingce Xia , Haifang Li , Tao Qin , Nenghai Yu , Tie-Yan Liu

Budget-Constrained Multi-Armed Bandits with Multiple Plays

We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly $K$ out of $N$ possible arms have to be played (with $1\leq K \leq N$). In…

Machine Learning · Computer Science 2017-11-17 Datong P. Zhou , Claire J. Tomlin

An Asymptotically Optimal Strategy for Constrained Multi-armed Bandit Problems

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy. We…

Optimization and Control · Mathematics 2018-05-04 Hyeong Soo Chang

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit…

Machine Learning · Statistics 2015-12-18 Apostolos N. Burnetas , Odysseas Kanavetas , Michael N. Katehakis

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms

In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold $\tau $. We…

Machine Learning · Computer Science 2025-09-03 Chanakya Varude , Jay Chaudhary , Siddharth Kaushik , Prasanna Chaporkar

Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays

We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching $m$-arm strategy with minimax optimal…

Machine Learning · Computer Science 2019-12-02 N. Mert Vural , Hakan Gokcesu , Kaan Gokcesu , Suleyman S. Kozat

Profitable Bandits

Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K possible actions. For each action…

Machine Learning · Statistics 2018-05-09 Mastane Achab , Stephan Clémençon , Aurélien Garivier

Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step,…

Machine Learning · Computer Science 2026-03-24 Junwen Yang , Tianyuan Jin , Vincent Y. F. Tan

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. It is concerned with maximising the total rewards for a gambler by sequentially pulling an arm from a multi-armed slot machine where each arm…

Machine Learning · Statistics 2018-05-16 Xue Lu , Niall Adams , Nikolas Kantas

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically…

Machine Learning · Statistics 2019-03-22 Junpei Komiyama , Junya Honda , Hiroshi Nakagawa

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a…

Machine Learning · Computer Science 2021-06-01 Matias I. Müller , Cristian R. Rojas

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that…

Machine Learning · Computer Science 2024-03-21 Mohsen Bayati , Nima Hamidi , Ramesh Johari , Khashayar Khosravi

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between…

Machine Learning · Computer Science 2021-09-14 Nadav Merlis , Shie Mannor

Bayesian Algorithms for Decentralized Stochastic Bandits

We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently…

Machine Learning · Statistics 2020-10-29 Anusha Lalitha , Andrea Goldsmith

Multi-armed Bandits with Cost Subsidy

In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing…

Machine Learning · Computer Science 2021-03-16 Deeksha Sinha , Karthik Abinav Sankararama , Abbas Kazerouni , Vashist Avadhanula

Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem…

Machine Learning · Computer Science 2014-05-20 Stefan Magureanu , Richard Combes , Alexandre Proutiere