Related papers: Bandit problems with Levy payoff processes

Bandit problems with Levy processes

Bandit problems model the trade-off between exploration and exploitation in various decision problems. We study two-armed bandit problems in continuous time, where the risky arm can have two types: High or Low; both types yield stochastic…

Probability · Mathematics 2015-08-23 Asaf Cohen , Eilon Solan

Prior Ordering and Monotonicity in Dirichlet Bandits

One of two independent stochastic processes (arms) are to be selected at each of n stages. The selection is sequential and depends on past observations as well as the prior information. Observations from arm i are independent given a…

Statistics Theory · Mathematics 2011-01-26 Yaming Yu

Query-Reward Tradeoffs in Multi-Armed Bandits

We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…

Machine Learning · Computer Science 2022-10-28 Nadav Merlis , Yonathan Efroni , Shie Mannor

Undiscounted Bandit Games

We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a L\'{e}vy process with an unknown average payoff per unit of time which nature draws from an…

Theoretical Economics · Economics 2020-08-26 Godfrey Keller , Sven Rady

Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability

We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in…

Optimization and Control · Mathematics 2017-03-22 Roland Fryer , Philipp Harms

Contextual Linear Bandits with Delay as Payoff

A recent work by Schlisselberg et al. (2024) studies a delay-as-payoff model for stochastic multi-armed bandits, where the payoff (either loss or reward) is delayed for a period that is proportional to the payoff itself. While this captures…

Machine Learning · Computer Science 2025-02-21 Mengxiao Zhang , Yingfei Wang , Haipeng Luo

On the optimality of Periodic barrier strategies for a spectrally positive L\'evy process

We study the optimal dividend problem in the dual model where dividend payments can only be made at the jump times of an independent Poisson process. In this context, Avanzi et al. [5] solved the case with i.i.d. hyperexponential jumps;…

Probability · Mathematics 2017-08-15 José-Luis Pérez , Kazutoshi Yamazaki

Stochastic Bandit Based on Empirical Moments

In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…

Statistics Theory · Mathematics 2013-03-29 Junya Honda , Akimichi Takemura

Optimal Multiple Stopping with Negative Discount Rate and Random Refraction Times under Levy Models

This paper studies a class of optimal multiple stopping problems driven by L\'evy processes. Our model allows for a negative effective discount rate, which arises in a number of financial applications, including stock loans and real…

Mathematical Finance · Quantitative Finance 2016-03-11 Tim Leung , Kazutoshi Yamazaki , Hongzhong Zhang

Bandit Problems with Side Observations

An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull. At each time t, before making a selection, the decision maker is able…

Information Theory · Computer Science 2007-07-16 Chih-Chun Wang , Sanjeev R. Kulkarni , H. Vincent Poor

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option

We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…

Machine Learning · Computer Science 2020-03-26 P Sharoff , Nishant A. Mehta , Ravi Ganti

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…

Statistics Theory · Mathematics 2008-12-18 Aurélien Garivier , Eric Moulines

Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem

The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…

Machine Learning · Computer Science 2026-05-22 Avrim Blum , Marten Garicano , Kavya Ravichandran , Dravyansh Sharma

Be Greedy in Multi-Armed Bandits

The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…

Machine Learning · Computer Science 2021-01-05 Matthieu Jedor , Jonathan Louëdec , Vianney Perchet

Stochastic Bandits with Delay-Dependent Payoffs

Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…

Machine Learning · Statistics 2020-02-20 Leonardo Cella , Nicolò Cesa-Bianchi

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence

Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization…

Machine Learning · Computer Science 2025-10-27 Shunta Nonaga , Koji Tabata , Yuta Mizuno , Tamiki Komatsuzaki