Related papers: A policy iteration algorithm for non-Markovian con…
We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework…
We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the…
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings. The algorithm is a variation of Newton's method, and we show that in the model-based setting it…
In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation methods and sampling-based algorithms for deterministic path planning,…
This article presents a constrained policy optimization approach for the optimal control of systems under nonstationary uncertainties. We introduce an assumption that we call Markov embeddability that allows us to cast the stochastic…
We propose a policy iteration algorithm for solving the multiplicative noise linear quadratic output feedback design problem. The algorithm solves a set of coupled Riccati equations for estimation and control arising from a partially…
In this paper, we consider the gradual-impulse control problem of continuous-time Markov decision processes, where the system performance is measured by the expectation of the exponential utility of the total cost. We prove, under very…
This paper studies an optimal control problem for continuous-time stochastic systems subject to reachability objectives specified in a subclass of metric interval temporal logic specifications, a temporal logic with real-time constraints.…
For a general entropy-regularized stochastic control problem on an infinite horizon, we prove that a policy iteration algorithm (PIA) converges to an optimal relaxed control. Contrary to the standard stochastic control literature, classical…
This paper studies a continuous-time stochastic linear-quadratic (SLQ) optimal control problem on infinite-horizon. A data-driven policy iteration algorithm is proposed to solve the SLQ problem. Without knowing three system coefficient…
We present a continuous-time equivalent to the well-known iterative linear-quadratic algorithm including an implementation of a backtracking line-search policy and a novel regularization approach based on the necessary conditions in the…
The main goal of this paper is to apply the so-called policy iteration algorithm (PIA) for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP's) taking values in a general Borel space and with…
In this paper we investigate the convergence of the Policy Iteration Algorithm (PIA) for a class of general continuous-time entropy-regularized stochastic control problems. In particular, instead of employing sophisticated PDE estimates for…
Adaptive optimal control of nonlinear dynamic systems with deterministic and known dynamics under a known undiscounted infinite-horizon cost function is investigated. Policy iteration scheme initiated using a stabilizing initial control is…
Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the…
We propose a machine learning algorithm for solving finite-horizon stochastic control problems based on a deep neural network representation of the optimal policy functions. The algorithm has three features: (1) It can solve…
We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by "controlled" Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise…
We consider stochastic control models with Borel spaces and universally measurable policies. For such models the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a…
We present an accelerated algorithm for the solution of static Hamilton-Jacobi-Bellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear…
We consider the problem of optimally controlling stochastic, Markovian systems subject to joint chance constraints over a finite-time horizon. For such problems, standard Dynamic Programming is inapplicable due to the time correlation of…