Related papers: Contextual Markov Decision Processes
We study online learning in episodic constrained Markov decision processes (CMDPs), where the learner aims at collecting as much reward as possible over the episodes, while satisfying some long-term constraints during the learning process.…
Autonomous agents are often required to plan under multiple objectives whose preference ordering varies based on context. The agent may encounter multiple contexts during its course of operation, each imposing a distinct lexicographic…
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts…
In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user's preferences, is not given. In…
In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…
We consider a reinforcement learning (RL) setting in which the agent interacts with a sequence of episodic MDPs. At the start of each episode the agent has access to some side-information or context that determines the dynamics of the MDP…
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. State-based models are…
We study a class of sequential decision-making problems with augmented predictions, potentially provided by a machine learning algorithm. In this setting, the decision-maker receives prediction intervals for unknown parameters that become…
Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve…
The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific…
In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing…
We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we…
A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary…
In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the…
This paper presents a hierarchical decision-making framework for autonomous systems operating under uncertainty, demonstrated through autonomous driving as a representative application. Surrounding agents are modeled using Hybrid Markov…
The main goal of this paper is to investigate continuous-time distributed dynamic programming (DP) algorithms for networked multi-agent Markov decision problems (MAMDPs). In our study, we adopt a distributed multi-agent framework where…
Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive…