Related papers: Bandit problems with Levy processes

Bandit problems with Levy payoff processes

We study two-armed Levy bandits in continuous-time, which have one safe arm that yields a constant payoff s, and one risky arm that can be either of type High or Low; both types yield stochastic payoffs generated by a Levy process. The…

Probability · Mathematics 2009-06-05 Asaf Cohen , Eilon Solan

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new…

Machine Learning · Computer Science 2012-11-06 Sébastien Bubeck , Nicolò Cesa-Bianchi

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Bandit Problems with Side Observations

An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull. At each time t, before making a selection, the decision maker is able…

Information Theory · Computer Science 2007-07-16 Chih-Chun Wang , Sanjeev R. Kulkarni , H. Vincent Poor

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in…

Optimization and Control · Mathematics 2017-03-22 Roland Fryer , Philipp Harms

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized…

Machine Learning · Computer Science 2020-10-09 Yu-Heng Hung , Ping-Chun Hsieh , Xi Liu , P. R. Kumar

Approximate optimality and the risk/reward tradeoff in a class of bandit problems

This paper studies a sequential decision problem where payoff distributions are known and where the riskiness of payoffs matters. Equivalently, it studies sequential choice from a repeated set of independent lotteries. The decision-maker is…

Theoretical Economics · Economics 2024-01-02 Zengjing Chen , Larry G. Epstein , Guodong Zhang

The Multi-Armed Bandit, with Constraints

The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis…

Optimization and Control · Mathematics 2012-03-22 Eric V. Denardo , Eugene A. Feinberg , Uriel G. Rothblum

A new approach to Poissonian two-armed bandit problem

We consider a continuous time two-armed bandit problem in which incomes are described by Poissonian processes. We develop Bayesian approach with arbitrary prior distribution. We present two versions of recursive equation for determination…

Statistics Theory · Mathematics 2019-07-16 Alexander Kolnogorov

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement…

Machine Learning · Statistics 2022-02-23 Wenshuo Guo , Kumar Krishna Agrawal , Aditya Grover , Vidya Muthukumar , Ashwin Pananjady

Bandits and Experts in Metric Spaces

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite…

Data Structures and Algorithms · Computer Science 2019-04-16 Robert Kleinberg , Aleksandrs Slivkins , Eli Upfal

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

An empirical evaluation of active inference in multi-armed bandits

A key feature of sequential decision making under uncertainty is a need to balance between exploiting--choosing the best action according to the current knowledge, and exploring--obtaining information about values of other actions. The…

Machine Learning · Computer Science 2021-08-27 Dimitrije Markovic , Hrvoje Stojic , Sarah Schwoebel , Stefan J. Kiebel

Multi-armed bandit problem with precedence relations

Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward…

Statistics Theory · Mathematics 2007-06-13 Hock Peng Chan , Cheng-Der Fuh , Inchi Hu

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Best Arm Identification in Batched Multi-armed Bandit Problems

Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online…

Machine Learning · Statistics 2023-12-22 Shengyu Cao , Simai He , Ruoqing Jiang , Jin Xu , Hongsong Yuan

Linearly Parameterized Bandits

We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an $r$-dimensional random vector $\mathbf{Z} \in \mathbb{R}^r$, where $r \geq 2$. The…

Machine Learning · Computer Science 2010-02-24 Paat Rusmevichientong , John N. Tsitsiklis

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions…

Machine Learning · Computer Science 2018-01-11 Sampath Kannan , Jamie Morgenstern , Aaron Roth , Bo Waggoner , Zhiwei Steven Wu