Related papers: Maximum Entropy Differential Dynamic Programming
Differential Dynamic Programming is an optimal control technique often used for trajectory generation. Many variations of this algorithm have been developed in the literature, including algorithms for stochastic dynamics or state and input…
Differential Dynamic Programming (DDP) has become a well established method for unconstrained trajectory optimization. Despite its several applications in robotics and controls however, a widely successful constrained version of the…
This article proposes an improved trajectory optimization approach for stochastic optimal control of dynamical systems affected by measurement noise by combining optimal control with maximum likelihood techniques to improve the reduction of…
We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding…
Pontryagin type maximum principle and Bellman's dynamic programming principle serve as two of the most important tools in solving optimal control problems. There is a huge literature on the study of relationship between them. The main…
Differential Dynamic Programming (DDP) is an efficient trajectory optimization algorithm relying on second-order approximations of a system's dynamics and cost function, and has recently been applied to optimize systems with time-invariant…
The aim of this paper is to address optimality of stochastic control strategies via dynamic programming subject to total variation distance ambiguity on the conditional distribution of the controlled process. We formulate the stochastic…
We present an algorithm called Tropical Dynamic Programming (TDP) which builds upper and lower approximations of the Bellman value functions in risk-neutral Multistage Stochastic Programming (MSP), with independent noises of finite…
Recent work [Ran22] formulated a class of optimal control problems involving positive linear systems, linear stage costs, and elementwise constraints on control. It was shown that the problem admits linear optimal cost and the associated…
This paper build on our recent work where we presented a dual stochastic optimal control formulation of the nonlinear filtering problem [1]. The constraint for the dual problem is a backward stochastic differential equations (BSDE). The…
Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot…
We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method is a generalization of the legacy work with Shannon entropy, which…
This paper presents a new theory, known as robust dynamic pro- gramming, for a class of continuous-time dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze…
When an expert operates a perilous dynamic system, ideal constraint information is tacitly contained in their demonstrated trajectories and controls. The likelihood of these demonstrations can be computed, given the system dynamics and task…
We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes…
This paper introduces a novel Differential Dynamic Programming (DDP) algorithm for solving discrete-time finite-horizon optimal control problems with inequality constraints. Two variants, namely Feasible- and Infeasible-IPDDP algorithms,…
This paper investigates the relationship between Pontryagin's maximum principle and dynamic programming principle in the context of stochastic optimal control systems governed by stochastic evolution equations with random coefficients in…
We analyze an optimal stopping problem with a constraint on the expected cost. When the reward function and cost function are Lipschitz continuous in state variable, we show that the value of such an optimal stopping problem is a continuous…
The standard Dynamic Programming (DP) formulation can be used to solve Multi-Stage Optimization Problems (MSOP's) with additively separable objective functions. In this paper we consider a larger class of MSOP's with monotonically backward…
We show that the max entropy algorithm can be derandomized (with respect to a particular objective function) to give a deterministic $3/2-\epsilon$ approximation algorithm for metric TSP for some $\epsilon > 10^{-36}$. To obtain our result,…