Related papers: Corralling Stochastic Bandit Algorithms

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

We consider the problem of combining and learning over a set of adversarial bandit algorithms with the goal of adaptively tracking the best one on the fly. The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al.,…

Machine Learning · Computer Science 2022-02-15 Haipeng Luo , Mengxiao Zhang , Peng Zhao , Zhi-Hua Zhou

Regret Bounds for Batched Bandits

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

Best of Both Worlds Model Selection

We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our…

Machine Learning · Computer Science 2022-07-01 Aldo Pacchiano , Christoph Dann , Claudio Gentile

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

Confounded Budgeted Causal Bandits

We study the problem of learning 'good' interventions in a stochastic environment modeled by its underlying causal graph. Good interventions refer to interventions that maximize rewards. Specifically, we consider the setting of a…

Machine Learning · Computer Science 2024-01-17 Fateme Jamshidi , Jalal Etesami , Negar Kiyavash

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

Corralling a Band of Bandit Algorithms

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be…

Machine Learning · Computer Science 2017-06-07 Alekh Agarwal , Haipeng Luo , Behnam Neyshabur , Robert E. Schapire

The best of both worlds: stochastic and adversarial bandits

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the square-root worst-case regret of Exp3…

Machine Learning · Computer Science 2012-02-22 Sebastien Bubeck , Aleksandrs Slivkins

Bandit algorithms to emulate human decision making using probabilistic distortions

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward…

Machine Learning · Computer Science 2023-11-01 Ravi Kumar Kolla , Prashanth L. A. , Aditya Gopalan , Krishna Jagannathan , Michael Fu , Steve Marcus

A Novel Confidence-Based Algorithm for Structured Bandits

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the…

Machine Learning · Computer Science 2020-05-26 Andrea Tirinzoni , Alessandro Lazaric , Marcello Restelli

Multi-Armed Bandits with Correlated Arms

We consider a multi-armed bandit framework where the rewards obtained by pulling different arms are correlated. We develop a unified approach to leverage these reward correlations and present fundamental generalizations of classic bandit…

Machine Learning · Statistics 2021-09-13 Samarth Gupta , Shreyas Chaudhari , Gauri Joshi , Osman Yağan

Discrepancy-Based Algorithms for Non-Stationary Rested Bandits

We study the multi-armed bandit problem where the rewards are realizations of general non-stationary stochastic processes, a setting that generalizes many existing lines of work and analyses. In particular, we present a theoretical analysis…

Machine Learning · Computer Science 2020-09-04 Corinna Cortes , Giulia DeSalvo , Vitaly Kuznetsov , Mehryar Mohri , Scott Yang

On Penalization in Stochastic Multi-armed Bandits

We study an important variant of the stochastic multi-armed bandit (MAB) problem, which takes penalization into consideration. Instead of directly maximizing cumulative expected reward, we need to balance between the total reward and…

Machine Learning · Statistics 2022-11-16 Guanhua Fang , Ping Li , Gennady Samorodnitsky

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms'…

Machine Learning · Computer Science 2020-06-23 Kumar Ashutosh , Jayakrishnan Nair , Anmol Kagrecha , Krishna Jagannathan

Stochastic Bandits Robust to Adversarial Attacks

This paper investigates stochastic multi-armed bandit algorithms that are robust to adversarial attacks, where an attacker can first observe the learner's action and {then} alter their reward observation. We study two cases of this model,…

Machine Learning · Computer Science 2024-08-19 Xuchuang Wang , Jinhang Zuo , Xutong Liu , John C. S. Lui , Mohammad Hajiesmaili

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific regret lower bound, and discuss its scaling with the dimension…

Machine Learning · Computer Science 2015-11-09 Richard Combes , M. Sadegh Talebi , Alexandre Proutiere , Marc Lelarge

Stochastic Graph Bandit Learning with Side-Observations

In this paper, we investigate the stochastic contextual bandit with general function space and graph feedback. We propose an algorithm that addresses this problem by adapting to both the underlying graph structures and reward gaps. To the…

Machine Learning · Computer Science 2024-01-09 Xueping Gong , Jiheng Zhang

We consider a novel multi-armed bandit framework where the rewards obtained by pulling the arms are functions of a common latent random variable. The correlation between arms due to the common random source can be used to design a…

Machine Learning · Statistics 2019-01-31 Samarth Gupta , Gauri Joshi , Osman Yağan

Budgeted and Non-budgeted Causal Bandits

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified.…

Machine Learning · Computer Science 2020-12-15 Vineet Nair , Vishakha Patil , Gaurav Sinha