English
Related papers

Related papers: On Explore-Then-Commit Strategies

200 papers

In this paper, we study multi-armed bandit problems in explore-then-commit setting. In our proposed explore-then-commit setting, the goal is to identify the best arm after a pure experimentation (exploration) phase and exploit it once or…

Machine Learning · Computer Science 2020-12-16 Ali Yekkehkhany , Ebrahim Arian , Mohammad Hajiesmaili , Rakesh Nagi

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new…

Machine Learning · Computer Science 2012-11-06 Sébastien Bubeck , Nicolò Cesa-Bianchi

The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+\epsilon,…

Machine Learning · Statistics 2012-09-11 Sébastien Bubeck , Nicolò Cesa-Bianchi , Gábor Lugosi

In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler's…

Machine Learning · Computer Science 2019-06-11 Omar Besbes , Yonatan Gur , Assaf Zeevi

We study the recovering bandits problem, a variant of the stochastic multi-armed bandit problem where the expected reward of each arm varies according to some unknown function of the time since the arm was last played. While being a natural…

Machine Learning · Statistics 2019-11-01 Ciara Pike-Burke , Steffen Grünewälder

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them…

Machine Learning · Computer Science 2015-11-04 Gergely Neu

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off…

Machine Learning · Computer Science 2026-05-14 Mohammad Pedramfar , Siamak Ravanbakhsh

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and…

Machine Learning · Statistics 2020-10-23 Yinglun Zhu , Robert Nowak

We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step,…

Machine Learning · Computer Science 2026-03-24 Junwen Yang , Tianyuan Jin , Vincent Y. F. Tan

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising.…

Data Structures and Algorithms · Computer Science 2017-09-06 Ashwinkumar Badanidiyuru , Robert Kleinberg , Aleksandrs Slivkins

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for…

Machine Learning · Statistics 2015-12-18 Wesley Cowan , Michael N. Katehakis

We consider minimisation of dynamic regret in non-stationary bandits with a slowly varying property. Namely, we assume that arms' rewards are stochastic and independent over time, but that the absolute difference between the expected…

Machine Learning · Computer Science 2021-10-26 Ramakrishnan Krishnamurthy , Aditya Gopalan

We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded…

Machine Learning · Computer Science 2021-02-09 Shubhada Agrawal , Sandeep Juneja , Wouter M. Koolen

We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between…

Machine Learning · Computer Science 2025-10-21 Sunmook Choi , Yahya Sattar , Yassir Jedra , Maryam Fazel , Sarah Dean

The multi-armed bandit(MAB) is a classical sequential decision problem. Most work requires assumptions about the reward distribution (e.g., bounded), while practitioners may have difficulty obtaining information about these distributions to…

Machine Learning · Computer Science 2023-12-14 Han Qi , Fei Guo , Li Zhu

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret,…

Statistics Theory · Mathematics 2010-07-26 Sébastien Bubeck , Rémi Munos , Gilles Stoltz

In $\mathcal{X}$-armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time…

Machine Learning · Statistics 2021-01-19 Valeriy Avanesov

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama
‹ Prev 1 2 3 10 Next ›