English
Related papers

Related papers: Bandit problems with Levy processes

200 papers

We study two-armed Levy bandits in continuous-time, which have one safe arm that yields a constant payoff s, and one risky arm that can be either of type High or Low; both types yield stochastic payoffs generated by a Levy process. The…

Probability · Mathematics 2009-06-05 Asaf Cohen , Eilon Solan

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new…

Machine Learning · Computer Science 2012-11-06 Sébastien Bubeck , Nicolò Cesa-Bianchi

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull. At each time t, before making a selection, the decision maker is able…

Information Theory · Computer Science 2007-07-16 Chih-Chun Wang , Sanjeev R. Kulkarni , H. Vincent Poor

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in…

Optimization and Control · Mathematics 2017-03-22 Roland Fryer , Philipp Harms

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized…

Machine Learning · Computer Science 2020-10-09 Yu-Heng Hung , Ping-Chun Hsieh , Xi Liu , P. R. Kumar

This paper studies a sequential decision problem where payoff distributions are known and where the riskiness of payoffs matters. Equivalently, it studies sequential choice from a repeated set of independent lotteries. The decision-maker is…

Theoretical Economics · Economics 2024-01-02 Zengjing Chen , Larry G. Epstein , Guodong Zhang

The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis…

Optimization and Control · Mathematics 2012-03-22 Eric V. Denardo , Eugene A. Feinberg , Uriel G. Rothblum

We consider a continuous time two-armed bandit problem in which incomes are described by Poissonian processes. We develop Bayesian approach with arbitrary prior distribution. We present two versions of recursive equation for determination…

Statistics Theory · Mathematics 2019-07-16 Alexander Kolnogorov

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement…

Machine Learning · Statistics 2022-02-23 Wenshuo Guo , Kumar Krishna Agrawal , Aditya Grover , Vidya Muthukumar , Ashwin Pananjady

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite…

Data Structures and Algorithms · Computer Science 2019-04-16 Robert Kleinberg , Aleksandrs Slivkins , Eli Upfal

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

A key feature of sequential decision making under uncertainty is a need to balance between exploiting--choosing the best action according to the current knowledge, and exploring--obtaining information about values of other actions. The…

Machine Learning · Computer Science 2021-08-27 Dimitrije Markovic , Hrvoje Stojic , Sarah Schwoebel , Stefan J. Kiebel

Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward…

Statistics Theory · Mathematics 2007-06-13 Hock Peng Chan , Cheng-Der Fuh , Inchi Hu

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online…

Machine Learning · Statistics 2023-12-22 Shengyu Cao , Simai He , Ruoqing Jiang , Jin Xu , Hongsong Yuan

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an $r$-dimensional random vector $\mathbf{Z} \in \mathbb{R}^r$, where $r \geq 2$. The…

Machine Learning · Computer Science 2010-02-24 Paat Rusmevichientong , John N. Tsitsiklis

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions…

Machine Learning · Computer Science 2018-01-11 Sampath Kannan , Jamie Morgenstern , Aaron Roth , Bo Waggoner , Zhiwei Steven Wu
‹ Prev 1 2 3 10 Next ›