English
Related papers

Related papers: Efficient-UCBV: An Almost Optimal Algorithm using …

200 papers

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…

Statistics Theory · Mathematics 2024-12-10 Qiyang Han , Koulik Khamaru , Cun-Hui Zhang

In this work, we address the open problem of finding low-complexity near-optimal multi-armed bandit algorithms for sequential decision making problems. Existing bandit algorithms are either sub-optimal and computationally simple (e.g.,…

Machine Learning · Computer Science 2018-04-18 Fang Liu , Sinong Wang , Swapna Buccapatnam , Ness Shroff

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates…

Machine Learning · Statistics 2025-02-18 Yingying Fan , Yuxuan Han , Jinchi Lv , Xiaocong Xu , Zhengyuan Zhou

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…

Machine Learning · Computer Science 2024-06-11 Ambrus Tamás , Szabolcs Szentpéteri , Balázs Csanád Csáji

One of the key drivers of complexity in the classical (stochastic) multi-armed bandit (MAB) problem is the difference between mean rewards in the top two arms, also known as the instance gap. The celebrated Upper Confidence Bound (UCB)…

Machine Learning · Computer Science 2021-10-27 Anand Kalvit , Assaf Zeevi

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and…

Machine Learning · Computer Science 2016-02-25 Tor Lattimore

I introduce and analyse an anytime version of the Optimally Confident UCB (OCUCB) algorithm designed for minimising the cumulative regret in finite-armed stochastic bandits with subgaussian noise. The new algorithm is simple, intuitive (in…

Machine Learning · Computer Science 2016-05-09 Tor Lattimore

In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence…

Machine Learning · Computer Science 2024-02-13 Yuriy Dorn , Aleksandr Katrutsa , Ilgam Latypov , Andrey Pudovikov

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…

Machine Learning · Statistics 2019-01-25 Yang Cao , Zheng Wen , Branislav Kveton , Yao Xie

The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via…

Machine Learning · Computer Science 2025-06-04 Junyi Fang , Yuxun Chen , Yuxin Chen , Chen Zhang

This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms…

Machine Learning · Computer Science 2022-06-15 Shinji Ito , Taira Tsuchiya , Junya Honda

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either…

Machine Learning · Computer Science 2014-05-21 Richard Combes , Alexandre Proutiere

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

This paper studies a new variant of the stochastic multi-armed bandits problem where auxiliary information about the arm rewards is available in the form of control variates. In many applications like queuing and wireless networks, the arm…

Machine Learning · Computer Science 2022-01-19 Arun Verma , Manjesh K. Hanawal

The regret lower bound of Lai and Robbins (1985), the gold standard for checking optimality of bandit algorithms, considers arm size fixed as sample size goes to infinity. We show that when arm size increases polynomially with sample size,…

Statistics Theory · Mathematics 2019-09-06 Hock Peng Chan , Shouri Hu

Motivated by wireless networks where interference or channel state estimates provide partial insight into throughput, we study a variant of the classical stochastic multi-armed bandit problem in which the learner has limited access to…

Machine Learning · Computer Science 2026-03-03 Arun Verma , Manjesh Kumar Hanawal , Arun Rajkumar

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

Motivated by economic applications such as recommender systems, we study the behavior of stochastic bandits algorithms under \emph{strategic behavior} conducted by rational actors, i.e., the arms. Each arm is a \emph{self-interested}…

Machine Learning · Computer Science 2020-11-16 Zhe Feng , David C. Parkes , Haifeng Xu

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…

Machine Learning · Computer Science 2019-06-11 Subhojyoti Mukherjee , K. P. Naveen , Nandan Sudarsanam , Balaraman Ravindran

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal randomness but…

Machine Learning · Computer Science 2026-04-23 Rohan Deb , Udaya Ghai , Karan Singh , Arindam Banerjee
‹ Prev 1 2 3 10 Next ›