Related papers: Bootstrapping Upper Confidence Bound

Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…

Machine Learning · Computer Science 2024-06-11 Ambrus Tamás , Szabolcs Szentpéteri , Balázs Csanád Csáji

Differentiable Linear Bandit Algorithm

Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal…

Machine Learning · Computer Science 2020-06-05 Kaige Yang , Laura Toni

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…

Statistics Theory · Mathematics 2024-12-10 Qiyang Han , Koulik Khamaru , Cun-Hui Zhang

Hierarchical Upper Confidence Bounds for Constrained Online Learning

The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation.…

Machine Learning · Computer Science 2024-10-28 Ali Baheri

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many stochastic sequential decision making problems. We employ tools from the self-normalized processes to provide a simple and self-contained proof of a tail bound of a…

Artificial Intelligence · Computer Science 2011-02-15 Yasin Abbasi-Yadkori , David Pal , Csaba Szepesvari

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…

Machine Learning · Computer Science 2021-10-27 Anand Kalvit , Assaf Zeevi

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e. those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. $arm$). We study a particular case of the rested…

Machine Learning · Statistics 2024-11-28 Marco Fiandri , Alberto Maria Metelli , Francesco Trov`o

Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting. EUCBV incorporates the arm elimination strategy proposed in…

Machine Learning · Computer Science 2018-07-12 Subhojyoti Mukherjee , K. P. Naveen , Nandan Sudarsanam , Balaraman Ravindran

Graph Feedback Bandits on Similar Arms: With and Without Graph Structures

In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by applications in clinical trials and recommendation systems, we assume that two arms are connected if and only if they are similar (i.e.,…

Machine Learning · Computer Science 2025-09-18 Han Qi , Fei Guo , Li Zhu , Qiaosheng Zhang

Precise Asymptotics and Refined Regret of Variance-Aware UCB

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates…

Machine Learning · Statistics 2025-02-18 Yingying Fan , Yuxuan Han , Jinchi Lv , Xiaocong Xu , Zhengyuan Zhou

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and…

Machine Learning · Computer Science 2016-02-25 Tor Lattimore

Feedback graph regret bounds for Thompson Sampling and UCB

We study the stochastic multi-armed bandit problem with the graph-based feedback structure introduced by Mannor and Shamir. We analyze the performance of the two most prominent stochastic bandit algorithms, Thompson Sampling and Upper…

Machine Learning · Computer Science 2020-02-17 Thodoris Lykouris , Eva Tardos , Drishti Wali

UCBoost: A Boosting Approach to Tame Complexity and Optimality for Stochastic Bandits

In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g.,…

Machine Learning · Computer Science 2018-04-18 Fang Liu , Sinong Wang , Swapna Buccapatnam , Ness Shroff

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

Combinatorial Bandits without Total Order for Arms

We consider the combinatorial bandits problem, where at each time step, the online learner selects a size-$k$ subset $s$ from the arms set $\mathcal{A}$, where $\left|\mathcal{A}\right| = n$, and observes a stochastic reward of each arm in…

Machine Learning · Computer Science 2021-03-05 Shuo Yang , Tongzheng Ren , Inderjit S. Dhillon , Sujay Sanghavi

Bounded Regret for Finitely Parameterized Multi-Armed Bandits

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent.…

Machine Learning · Computer Science 2020-11-10 Kishan Panaganti , Dileep Kalathil

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance…

Machine Learning · Statistics 2021-10-07 Xinyu Zhang , Srinjoy Das , Ken Kreutz-Delgado

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines