English
Related papers

Related papers: An Index-based Deterministic Asymptotically Optima…

200 papers

For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the $\epsilon_t$-greedy strategy. We…

Optimization and Control · Mathematics 2018-05-04 Hyeong Soo Chang

In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold \(\tau \). We…

Machine Learning · Computer Science 2025-09-03 Chanakya Varude , Jay Chaudhary , Siddharth Kaushik , Prasanna Chaporkar

We study a specific \textit{combinatorial pure exploration stochastic bandit problem} where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and \textit{for a fixed time horizon}.…

Machine Learning · Statistics 2016-05-30 Andrea Locatelli , Maurilio Gutzeit , Alexandra Carpentier

We study a generalization of the multi-armed bandit problem with multiple plays where there is a cost associated with pulling each arm and the agent has a budget at each time that dictates how much she can expect to spend. We derive an…

Machine Learning · Statistics 2019-09-13 Alexander Luedtke , Emilie Kaufmann , Antoine Chambaz

We adopt an optimal-control framework for addressing the undiscounted infinite-horizon discrete-time restless $N$-armed bandit problem. Unlike most studies that rely on constructing policies based on the relaxed single-armed Markov Decision…

Optimization and Control · Mathematics 2024-03-19 Chen YAN

This paper considers the multi-armed thresholding bandit problem -- identifying all arms whose expected rewards are above a predefined threshold via as few pulls (or rounds) as possible -- proposed by Locatelli et al. [2016] recently.…

Machine Learning · Statistics 2017-07-11 Jie Zhong , Yijun Huang , Ji Liu

We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of…

Machine Learning · Computer Science 2025-01-24 Zhirui Chen , P. N. Karthik , Yeow Meng Chee , Vincent Y. F. Tan

We consider restless multi-armed bandit (RMAB) with a finite horizon and multiple pulls per period. Leveraging the Lagrangian relaxation, we approximate the problem with a collection of single arm problems. We then propose an index-based…

Optimization and Control · Mathematics 2017-07-04 Weici Hu , Peter Frazier

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-\delta$, within $\varepsilon$…

Machine Learning · Computer Science 2023-06-06 Xiao-Yue Gong , Mark Sellke

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our…

Machine Learning · Computer Science 2026-03-31 Yige Hong , Qiaomin Xie , Yudong Chen , Weina Wang

We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well…

Machine Learning · Computer Science 2024-01-26 Emil Carlsson , Debabrota Basu , Fredrik D. Johansson , Devdatt Dubhashi

We consider the question introduced by \cite{Mason2020} of identifying all the $\varepsilon$-optimal arms in a finite stochastic multi-armed bandit with Gaussian rewards. We give two lower bounds on the sample complexity of any algorithm…

Machine Learning · Statistics 2022-04-07 Aymen Al Marjani , Tomáš Kocák , Aurélien Garivier

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist…

Machine Learning · Statistics 2021-07-05 Johannes Kirschner , Tor Lattimore , Claire Vernade , Csaba Szepesvári

We consider the infinite-horizon average-reward restless bandit problem. We propose a novel \emph{two-set policy} that maintains two dynamic subsets of arms: one subset of arms has a nearly optimal state distribution and takes actions…

Machine Learning · Computer Science 2024-10-18 Yige Hong , Qiaomin Xie , Yudong Chen , Weina Wang

We study the problem of identifying the best arm in a multi-armed bandit environment when each arm is a time-homogeneous and ergodic discrete-time Markov process on a common, finite state space. The state evolution on each arm is governed…

Machine Learning · Statistics 2022-03-30 P. N. Karthik , Kota Srinivas Reddy , Vincent Y. F. Tan

We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm…

Machine Learning · Statistics 2020-06-30 Yassir Jedra , Alexandre Proutiere

We propose minimum empirical divergence (MED) policy for the multiarmed bandit problem. We prove asymptotic optimality of the proposed policy for the case of finite support models. In our setting, Burnetas and Katehakis has already proposed…

Statistics Theory · Mathematics 2011-11-21 Junya Honda , Akimichi Takemura

In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit the structure of the problem and have been shown to be asymptotically suboptimal. In this paper, we follow recent approaches of deriving…

Machine Learning · Computer Science 2020-11-23 Andrea Tirinzoni , Matteo Pirotta , Marcello Restelli , Alessandro Lazaric

We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients, when each client has access to a {\em subset} of arms and each arm yields independent Gaussian observations. The goal is…

Machine Learning · Computer Science 2023-12-20 Zhirui Chen , P. N. Karthik , Vincent Y. F. Tan , Yeow Meng Chee

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama
‹ Prev 1 2 3 10 Next ›