Related papers: Be Greedy in Multi-Armed Bandits

On Regret with Multiple Best Arms

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and…

Machine Learning · Statistics 2020-10-23 Yinglun Zhu , Robert Nowak

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem). When instantiated differently, our algorithm achieves various new data-dependent regret…

Machine Learning · Computer Science 2018-06-08 Chen-Yu Wei , Haipeng Luo

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that…

Machine Learning · Computer Science 2024-03-21 Mohsen Bayati , Nima Hamidi , Ramesh Johari , Khashayar Khosravi

Algorithms for Linear Bandits on Polyhedral Sets

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions…

Machine Learning · Computer Science 2018-01-11 Sampath Kannan , Jamie Morgenstern , Aaron Roth , Bo Waggoner , Zhiwei Steven Wu

Global Bandits

Multi-armed bandits (MAB) model sequential decision making problems, in which a learner sequentially chooses arms with unknown reward distributions in order to maximize its cumulative reward. Most of the prior work on MAB assumes that the…

Machine Learning · Computer Science 2018-03-22 Onur Atan , Cem Tekin , Mihaela van der Schaar

Lifelong Learning in Multi-Armed Bandits

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem. In this paper, we study the problem in the multi-armed bandit framework with…

Machine Learning · Computer Science 2020-12-29 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Regret Bounds for Batched Bandits

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

Sparse Stochastic Bandits

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward. Guarantees can be obtained on a relative quantity called regret, which scales…

Machine Learning · Computer Science 2017-06-06 Joon Kwon , Vianney Perchet , Claire Vernade

Simple regret for infinitely many armed bandits

We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous…

Machine Learning · Computer Science 2015-05-19 Alexandra Carpentier , Michal Valko

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

Budgeted and Non-budgeted Causal Bandits

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified.…

Machine Learning · Computer Science 2020-12-15 Vineet Nair , Vishakha Patil , Gaurav Sinha

Rate-optimal Bayesian Simple Regret in Best Arm Identification

We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the…

Machine Learning · Computer Science 2023-07-27 Junpei Komiyama , Kaito Ariu , Masahiro Kato , Chao Qin

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…

Machine Learning · Computer Science 2023-01-05 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

This paper is motivated by recent research in the $d$-dimensional stochastic linear bandit literature, which has revealed an unsettling discrepancy: algorithms like Thompson sampling and Greedy demonstrate promising empirical performance,…

Machine Learning · Computer Science 2025-05-20 Yuwei Luo , Mohsen Bayati

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to…

Machine Learning · Computer Science 2021-12-28 Manish Raghavan , Aleksandrs Slivkins , Jennifer Wortman Vaughan , Zhiwei Steven Wu

Mostly Exploration-Free Algorithms for Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in…

Machine Learning · Statistics 2020-04-21 Hamsa Bastani , Mohsen Bayati , Khashayar Khosravi

Regret Minimization in Heavy-Tailed Bandits

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded…

Machine Learning · Computer Science 2021-02-09 Shubhada Agrawal , Sandeep Juneja , Wouter M. Koolen

Constant or logarithmic regret in asynchronous multiplayer bandits

Multiplayer bandits have recently been extensively studied because of their application to cognitive radio networks. While the literature mostly considers synchronous players, radio networks (e.g. for IoT) tend to have asynchronous devices.…

Machine Learning · Computer Science 2023-06-01 Hugo Richard , Etienne Boursier , Vianney Perchet