Related papers: Bandit problems with Levy payoff processes
Bandit problems model the trade-off between exploration and exploitation in various decision problems. We study two-armed bandit problems in continuous time, where the risky arm can have two types: High or Low; both types yield stochastic…
One of two independent stochastic processes (arms) are to be selected at each of n stages. The selection is sequential and depends on past observations as well as the prior information. Observations from arm i are independent given a…
We consider a stochastic multi-armed bandit setting where reward must be actively queried for it to be observed. We provide tight lower and upper problem-dependent guarantees on both the regret and the number of queries. Interestingly, we…
We analyze undiscounted continuous-time games of strategic experimentation with two-armed bandits. The risky arm generates payoffs according to a L\'{e}vy process with an unknown average payoff per unit of time which nature draws from an…
We present a two-armed bandit model of decision making under uncertainty where the expected return to investing in the "risky arm" increases when choosing that arm and decreases when choosing the "safe" arm. These dynamics are natural in…
A recent work by Schlisselberg et al. (2024) studies a delay-as-payoff model for stochastic multi-armed bandits, where the payoff (either loss or reward) is delayed for a period that is proportional to the payoff itself. While this captures…
We study the optimal dividend problem in the dual model where dividend payments can only be made at the jump times of an independent Poisson process. In this context, Avanzi et al. [5] solved the case with i.i.d. hyperexponential jumps;…
In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a…
This paper studies a class of optimal multiple stopping problems driven by L\'evy processes. Our model allows for a negative effective discount rate, which arises in a number of financial applications, including stock loans and real…
An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull. At each time t, before making a selection, the decision maker is able…
We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…
We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…
In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…
We consider a sequential decision-making problem where an agent can take one action at a time and each action has a stochastic temporal extent, i.e., a new action cannot be taken until the previous one is finished. Upon completion, the…
Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not…
The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performing clinical trials, and hyperparameter selection…
The Greedy algorithm is the simplest heuristic in sequential decision problem that carelessly takes the locally optimal choice at each round, disregarding any advantages of exploring and/or information gathering. Theoretically, it is known…
Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled.…
Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…
Decision making under uncertain environments in the maximization of expected reward while minimizing its risk is one of the ubiquitous problems in many subjects. Here, we introduce a novel problem setting in stochastic bandit optimization…