English
Related papers

Related papers: Differentiable Linear Bandit Algorithm

200 papers

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be…

Machine Learning · Statistics 2013-12-30 Kevin Jamieson , Matthew Malloy , Robert Nowak , Sébastien Bubeck

Upper Confidence Bound (UCB) algorithms are a widely-used class of sequential algorithms for the $K$-armed bandit problem. Despite extensive research over the past decades aimed at understanding their asymptotic and (near) minimax…

Statistics Theory · Mathematics 2024-12-10 Qiyang Han , Koulik Khamaru , Cun-Hui Zhang

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen…

Machine Learning · Computer Science 2022-01-31 Michael Rawson , Jade Freeman

We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others. Our UCB distance…

Machine Learning · Statistics 2021-10-07 Xinyu Zhang , Srinjoy Das , Ken Kreutz-Delgado

We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential…

Machine Learning · Computer Science 2026-01-07 Yajing Liu , Erkao Bao , Linqi Song

We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the…

Statistics Theory · Mathematics 2019-02-04 Alexander Kolnogorov , Sergey Garbar

Stochastic multi-armed bandits (MABs) provide a fundamental reinforcement learning model to study sequential decision making in uncertain environments. The upper confidence bounds (UCB) algorithm gave birth to the renaissance of bandit…

Machine Learning · Computer Science 2024-06-11 Ambrus Tamás , Szabolcs Szentpéteri , Balázs Csanád Csáji

In this paper, we discuss the asymptotic behavior of the Upper Confidence Bound (UCB) algorithm in the context of multiarmed bandit problems and discuss its implication in downstream inferential tasks. While inferential tasks become…

Machine Learning · Statistics 2024-08-09 Koulik Khamaru , Cun-Hui Zhang

The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation.…

Machine Learning · Computer Science 2024-10-28 Ali Baheri

We provide a simple method to combine stochastic bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem that we solve with a…

Machine Learning · Computer Science 2020-12-25 Ashok Cutkosky , Abhimanyu Das , Manish Purohit

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

Motivated by wireless networks where interference or channel state estimates provide partial insight into throughput, we study a variant of the classical stochastic multi-armed bandit problem in which the learner has limited access to…

Machine Learning · Computer Science 2026-03-03 Arun Verma , Manjesh Kumar Hanawal , Arun Rajkumar

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $\rho$-replicable if two executions using shared internal randomness but…

Machine Learning · Computer Science 2026-04-23 Rohan Deb , Udaya Ghai , Karan Singh , Arindam Banerjee

The multi-armed bandit (MAB) problem is a classical problem that models sequential decision-making under uncertainty in reinforcement learning. In this study, we propose a new generalized upper confidence bound (UCB) algorithm (GWA-UCB1) by…

Machine Learning · Computer Science 2023-08-29 Nobuhito Manome , Shuji Shinohara , Ung-il Chung

By leveraging the representation power of deep neural networks, neural upper confidence bound (UCB) algorithms have shown success in contextual bandits. To further balance the exploration and exploitation, we propose…

Machine Learning · Computer Science 2025-03-12 Ha Manh Bui , Enrique Mallada , Anqi Liu

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold. A key feature of AugUCB…

Machine Learning · Computer Science 2019-06-11 Subhojyoti Mukherjee , K. P. Naveen , Nandan Sudarsanam , Balaraman Ravindran

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter. The arms are divided into different groups, each of which has a common parameter. Therefore, when…

Machine Learning · Computer Science 2018-02-23 Zhiyang Wang , Ruida Zhou , Cong Shen

We propose $\tt RandUCB$, a bandit strategy that builds on theoretically derived confidence intervals similar to upper confidence bound (UCB) algorithms, but akin to Thompson sampling (TS), it uses randomization to trade off exploration and…

Machine Learning · Computer Science 2020-03-24 Sharan Vaswani , Abbas Mehrabian , Audrey Durand , Branislav Kveton

This paper studies a decentralized homogeneous multi-armed bandit problem in a multi-agent network. The problem is simultaneously solved by $N$ agents assuming they face a common set of $M$ arms and share the same arms' reward…

Machine Learning · Computer Science 2024-12-31 Jingxuan Zhu , Ethan Mulle , Christopher S. Smith , Alec Koppel , Ji Liu
‹ Prev 1 2 3 10 Next ›