Variational Dynamic Programming for Stochastic Optimal Control
Abstract
We consider the problem of stochastic optimal control, where the state-feedback control policies take the form of a probability distribution and where a penalty on the entropy is added. By viewing the cost function as a Kullback- Leibler (KL) divergence between two joint distributions, we bring the tools from variational inference to bear on our optimal control problem. This allows for deriving a dynamic programming principle, where the value function is defined as a KL divergence again. We then resort to Gaussian distributions to approximate the control policies and apply the theory to control affine nonlinear systems with quadratic costs. This results in closed-form recursive updates, which generalize LQR control and the backward Riccati equation. We illustrate this novel method on the simple problem of stabilizing an inverted pendulum.
Cite
@article{arxiv.2404.14806,
title = {Variational Dynamic Programming for Stochastic Optimal Control},
author = {Marc Lambert and Francis Bach and Silvère Bonnabel},
journal= {arXiv preprint arXiv:2404.14806},
year = {2024}
}