Related papers: Differentiable Linear Bandit Algorithm

Bootstrapping Upper Confidence Bound

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be…

Machine Learning · Statistics 2013-12-30 Kevin Jamieson , Matthew Malloy , Robert Nowak , Sébastien Bubeck

UCB algorithms for multi-armed bandits: Precise regret and adaptive inference

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…

Statistics Theory · Mathematics 2024-12-10 Qiyang Han , Koulik Khamaru , Cun-Hui Zhang

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen…

Machine Learning · Computer Science 2022-01-31 Michael Rawson , Jade Freeman

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance…

Machine Learning · Statistics 2021-10-07 Xinyu Zhang , Srinjoy Das , Ken Kreutz-Delgado

A UCB Bandit Algorithm for General ML-Based Estimators

We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential…

Machine Learning · Computer Science 2026-01-07 Yajing Liu , Erkao Bao , Linqi Song

Multi-Armed Bandit Problem and Batch UCB Rule

We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…

Statistics Theory · Mathematics 2019-02-04 Alexander Kolnogorov , Sergey Garbar

Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…

Machine Learning · Computer Science 2024-06-11 Ambrus Tamás , Szabolcs Szentpéteri , Balázs Csanád Csáji

Inference with the Upper Confidence Bound Algorithm

In this paper, we discuss the asymptotic behavior of the Upper Confidence Bound (UCB) algorithm in the context of multiarmed bandit problems and discuss its implication in downstream inferential tasks. While inferential tasks become…

Machine Learning · Statistics 2024-08-09 Koulik Khamaru , Cun-Hui Zhang

Hierarchical Upper Confidence Bounds for Constrained Online Learning

The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation.…

Machine Learning · Computer Science 2024-10-28 Ali Baheri

Upper Confidence Bounds for Combining Stochastic Bandits

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

Stochastic Multi-Armed Bandits with Limited Control Variates

Motivated by wireless networks where interference or channel state estimates provide partial insight into throughput, we study a variant of the classical stochastic multi-armed bandit problem in which the learner has limited access to…

Machine Learning · Computer Science 2026-03-03 Arun Verma , Manjesh Kumar Hanawal , Arun Rajkumar

Replicable Bandits with UCB based Exploration

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal randomness but…

Machine Learning · Computer Science 2026-04-23 Rohan Deb , Udaya Ghai , Karan Singh , Arindam Banerjee

Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted Averages

The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning. In this study, we propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by…

Machine Learning · Computer Science 2023-08-29 Nobuhito Manome , Shuji Shinohara , Ung-il Chung

Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits

By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose…

Machine Learning · Computer Science 2025-03-12 Ha Manh Bui , Enrique Mallada , Anqi Liu

Thresholding Bandits with Augmented UCB

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…

Machine Learning · Computer Science 2019-06-11 Subhojyoti Mukherjee , K. P. Naveen , Nandan Sudarsanam , Balaraman Ravindran

Regional Multi-Armed Bandits

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…

Machine Learning · Computer Science 2018-02-23 Zhiyang Wang , Ruida Zhou , Cong Shen

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and…

Machine Learning · Computer Science 2020-03-24 Sharan Vaswani , Abbas Mehrabian , Audrey Durand , Branislav Kveton

Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits

This paper studies a decentralized homogeneous multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward…

Machine Learning · Computer Science 2024-12-31 Jingxuan Zhu , Ethan Mulle , Christopher S. Smith , Alec Koppel , Ji Liu