Related papers: Incentive Decision Processes
We study the problem of designing an optimal sequence of incentives that a principal should offer to an agent so that the agent's optimal behavior under the incentives realizes the principal's objective expressed as a temporal logic…
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios…
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a…
We study principal-agent problems where a farsighted agent takes costly actions in an MDP. The core challenge in these settings is that agent's actions are hidden to the principal, who can only observe their outcomes, namely state…
In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…
An algorithmic decision-maker incentivizes people to act in certain ways to receive better decisions. These incentives can dramatically influence subjects' behaviors and lives, and it is important that both decision-makers and…
In many scenarios, a principal dynamically interacts with an agent and offers a sequence of incentives to align the agent's behavior with a desired objective. This paper focuses on the problem of synthesizing an incentive sequence that,…
Interval Markov decision processes are a class of Markov models where the transition probabilities between the states belong to intervals. In this paper, we study the problem of efficient estimation of the optimal policies in Interval…
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Mean payoff (or long-run average reward) provides a mathematically elegant formalism to express performance related…
The interactive partially observable Markov decision process (I-POMDP) is a recently developed framework which extends the POMDP to the multi-agent setting by including agent models in the state space. This paper argues for formulating the…
In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…
In this work, we study the problem of actively classifying the attributes of dynamical systems characterized as a finite set of Markov decision process (MDP) models. We are interested in finding strategies that actively interact with the…
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding…
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents…
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents…
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…
The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and…
Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…
Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system…
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes…