Related papers: Efficient Inference in Markov Control Problems
This note re-visits the rolling-horizon control approach to the problem of a Markov decision process (MDP) with infinite-horizon discounted expected reward criterion. Distinguished from the classical value-iteration approach, we develop an…
The infinite horizon setting is widely adopted for problems of reinforcement learning (RL). These invariably result in stationary policies that are optimal. In many situations, finite horizon control problems are of interest and for such…
In the context of Markov decision processes running in continuous time, one of the most intriguing challenges is the efficient approximation of finite horizon reachability objectives. A multitude of sophisticated model checking algorithms…
The goal of this paper is to analyze distributional Markov Decision Processes as a class of control problems in which the objective is to learn policies that steer the distribution of a cumulative reward toward a prescribed target law,…
We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we…
Many control problems in environments that can be modeled as Markov decision processes (MDPs) concern infinite-time horizon specifications. The classical aim in this context is to compute a control policy that maximizes the probability of…
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…
Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for continuous space-time control problems with nonlinear state dynamics has been elusive. This paper proposes proximal gradient…
Policy gradient methods are widely used in reinforcement learning. Yet, the nonconvexity of policy optimization poses significant challenges in understanding the global convergence of policy gradient methods. For a class of finite-horizon…
We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that…
We analyse a version of the policy iteration algorithm for the discounted infinite-horizon problem for controlled multidimensional diffusion processes, where both the drift and the diffusion coefficient can be controlled. We prove that,…
We consider a problem of optimal control of an infinite horizon system governed by forward-backward stochastic differential equations with delay. Sufficient and necessary maximum principles for optimal control under partial information in…
We analyze the infinite horizon minimax average cost Markov Control Model (MCM), for a class of controlled process conditional distributions, which belong to a ball, with respect to total variation distance metric, centered at a known…
In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an…
We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov…
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm,…
We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterise the value function via HJB equation…
In this work, solution of the finite horizon hybrid optimal control problem as the central element of the receding horizon optimal control (model predictive control) is investigated based on the indirect approach. The response of a hybrid…
We present discrete-time approximation of optimal control policies for infinite horizon discounted/ergodic control problems for controlled diffusions in $\Rd$\,. In particular, our objective is to show near optimality of optimal policies…
We present the first finite time global convergence analysis of policy gradient in the context of infinite horizon average reward Markov decision processes (MDPs). Specifically, we focus on ergodic tabular MDPs with finite state and action…