Related papers: Inference with the Upper Confidence Bound Algorith…

Statistical Inference under Adaptive Sampling with LinUCB

Adaptively collected data has become ubiquitous within modern practice. However, even seemingly benign adaptive sampling schemes can introduce severe biases, rendering traditional statistical inference tools inapplicable. This can be…

Statistics Theory · Mathematics 2025-12-02 Wei Fan , Kevin Tan , Yuting Wei

On Instability of Minimax Optimal Optimism-Based Bandit Algorithms

Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to…

Machine Learning · Statistics 2025-11-25 Samya Praharaj , Koulik Khamaru

Differentiable Linear Bandit Algorithm

Upper Confidence Bound (UCB) is arguably the most commonly used method for linear multi-arm bandit problems. While conceptually and computationally simple, this method highly relies on the confidence bounds, failing to strike the optimal…

Machine Learning · Computer Science 2020-06-05 Kaige Yang , Laura Toni

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…

Statistics Theory · Mathematics 2024-12-10 Qiyang Han , Koulik Khamaru , Cun-Hui Zhang

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be…

Machine Learning · Statistics 2013-12-30 Kevin Jamieson , Matthew Malloy , Robert Nowak , Sébastien Bubeck

Precise Asymptotics and Refined Regret of Variance-Aware UCB

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates…

Machine Learning · Statistics 2025-02-18 Yingying Fan , Yuxuan Han , Jinchi Lv , Xiaocong Xu , Zhengyuan Zhou

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…

Machine Learning · Computer Science 2021-10-27 Anand Kalvit , Assaf Zeevi

Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages

The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning. In this study, we propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by…

Machine Learning · Computer Science 2023-08-29 Nobuhito Manome , Shuji Shinohara , Ung-il Chung

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received…

Machine Learning · Statistics 2023-11-08 Marc Jourdan , Rémy Degenne

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen…

Machine Learning · Computer Science 2022-01-31 Michael Rawson , Jade Freeman

Multi-Armed Bandit Problem and Batch UCB Rule

We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…

Statistics Theory · Mathematics 2019-02-04 Alexander Kolnogorov , Sergey Garbar

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the…

Statistics Theory · Mathematics 2019-01-17 Hock Peng Chan

We consider a novel multi-armed bandit framework where the rewards obtained by pulling the arms are functions of a common latent random variable. The correlation between arms due to the common random source can be used to design a…

Machine Learning · Statistics 2019-01-31 Samarth Gupta , Gauri Joshi , Osman Yağan

Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits

This paper studies a decentralized homogeneous multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward…

Machine Learning · Computer Science 2024-12-31 Jingxuan Zhu , Ethan Mulle , Christopher S. Smith , Alec Koppel , Ji Liu

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

The principle of optimism in the face of uncertainty is one of the most widely used and successful ideas in multi-armed bandits and reinforcement learning. However, existing optimistic algorithms (primarily UCB and its variants) often…

Machine Learning · Computer Science 2024-03-12 Yunbei Xu , Assaf Zeevi

Bootstrapping Upper Confidence Bound

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

Thresholding Bandits with Augmented UCB

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…

Machine Learning · Computer Science 2019-06-11 Subhojyoti Mukherjee , K. P. Naveen , Nandan Sudarsanam , Balaraman Ravindran

Unified theory of upper confidence bound policies for bandit problems targeting total reward, maximal reward, and more

The upper confidence bound (UCB) policy is recognized as an order-optimal solution for the classical total-reward bandit problem. While similar UCB-based approaches have been applied to the max bandit problem, which aims to maximize the…

Machine Learning · Statistics 2024-11-04 Nobuaki Kikkawa , Hiroshi Ohno