Related papers: Adaptive Multiple-Arm Identification

The Max $K$-Armed Bandit: A PAC Lower Bound and tighter Algorithms

We consider the Max $K$-Armed Bandit problem, where a learning agent is faced with several sources (arms) of items (rewards), and interested in finding the best item overall. At each time step the agent chooses an arm, and obtains a random…

Machine Learning · Statistics 2015-08-25 Yahel David , Nahum Shimkin

Complexity Analysis of a Countable-armed Bandit Problem

We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is…

Machine Learning · Computer Science 2023-01-19 Anand Kalvit , Assaf Zeevi

The Max $K$-Armed Bandit: PAC Lower Bounds and Efficient Algorithms

We consider the Max $K$-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution. At each time step the agent chooses an arm, and observes the reward of the…

Machine Learning · Statistics 2015-12-25 Yahel David , Nahum Shimkin

Exploring $k$ out of Top $\rho$ Fraction of Arms in Stochastic Bandits

This paper studies the problem of identifying any $k$ distinct arms among the top $\rho$ fraction (e.g., top 5\%) of arms from a finite or infinite set with a probably approximately correct (PAC) tolerance $\epsilon$. We consider two cases:…

Machine Learning · Computer Science 2020-11-20 Wenbo Ren , Jia Liu , Ness Shroff

PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits

We consider the problem of identifying any $k$ out of the best $m$ arms in an $n$-armed stochastic multi-armed bandit. Framed in the PAC setting, this particular problem generalises both the problem of `best subset selection' and that of…

Machine Learning · Computer Science 2019-01-25 Arghya Roy Chaudhuri , Shivaram Kalyanakrishnan

Practical Algorithms for Best-K Identification in Multi-Armed Bandits

In the Best-$K$ identification problem (Best-$K$-Arm), we are given $N$ stochastic bandit arms with unknown reward distributions. Our goal is to identify the $K$ arms with the largest means with high confidence, by drawing samples from the…

Machine Learning · Computer Science 2017-05-22 Haotian Jiang , Jian Li , Mingda Qiao

On Regret with Multiple Best Arms

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and…

Machine Learning · Statistics 2020-10-23 Yinglun Zhu , Robert Nowak

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private)…

Machine Learning · Statistics 2022-12-06 Kontantinos E. Nikolakakis , Dionysios S. Kalogerias , Or Sheffet , Anand D. Sarwate

Maximal Objectives in the Multi-armed Bandit with Applications

In several applications of the stochastic multi-armed bandit problem, the traditional objective of maximizing the expected total reward can be inappropriate. In this paper, motivated by certain operational concerns in online platforms, we…

Machine Learning · Computer Science 2024-10-16 Eren Ozbay , Vijay Kamble

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Motivated by the task of hyperparameter optimization, we introduce the non-stochastic best-arm identification problem. Within the multi-armed bandit literature, the cumulative regret objective enjoys algorithms and analyses for both the…

Machine Learning · Computer Science 2015-03-02 Kevin Jamieson , Ameet Talwalkar

Best Arm Identification with Minimal Regret

Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This innovative variant of the multi-armed bandit problem elegantly…

Machine Learning · Computer Science 2024-09-30 Junwen Yang , Vincent Y. F. Tan , Tianyuan Jin

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their…

Machine Learning · Computer Science 2022-03-09 Yasin Abbasi-Yadkori , Andras Gyorgy , Nevena Lazic

Constrained regret minimization for multi-criterion multi-armed bandits

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…

Machine Learning · Computer Science 2023-01-05 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

Bandit algorithms to emulate human decision making using probabilistic distortions

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward…

Machine Learning · Computer Science 2023-11-01 Ravi Kumar Kolla , Prashanth L. A. , Aditya Gopalan , Krishna Jagannathan , Michael Fu , Steve Marcus

Streaming Algorithms for Stochastic Multi-armed Bandits

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull…

Machine Learning · Computer Science 2020-12-10 Arnab Maiti , Vishakha Patil , Arindam Khan

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

In this paper, we propose a constant word (RAM model) algorithm for regret minimisation for both finite and infinite Stochastic Multi-Armed Bandit (MAB) instances. Most of the existing regret minimisation algorithms need to remember the…

Machine Learning · Computer Science 2019-01-25 Arghya Roy Chaudhuri , Shivaram Kalyanakrishnan

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online…

Machine Learning · Computer Science 2022-02-15 Aadirupa Saha , Pierre Gaillard

Rate-optimal Bayesian Simple Regret in Best Arm Identification

We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the…

Machine Learning · Computer Science 2023-07-27 Junpei Komiyama , Kaito Ariu , Masahiro Kato , Chao Qin

We consider a novel multi-armed bandit framework where the rewards obtained by pulling the arms are functions of a common latent random variable. The correlation between arms due to the common random source can be used to design a…

Machine Learning · Statistics 2019-01-31 Samarth Gupta , Gauri Joshi , Osman Yağan

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards…

Machine Learning · Computer Science 2020-12-16 Siwei Wang , Haoyun Wang , Longbo Huang