Policy Gradient for Rectangular Robust Markov Decision Processes

Navdeep Kumar; Esther Derman; Matthieu Geist; Kfir Levy; Shie Mannor

Policy Gradient for Rectangular Robust Markov Decision Processes

Machine Learning 2023-12-12 v2 Artificial Intelligence

Authors: Navdeep Kumar , Esther Derman , Matthieu Geist , Kfir Levy , Shie Mannor

Abstract

Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive. In this paper, we introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs). We provide a closed-form expression for the worst occupation measure. Incidentally, we find that the worst kernel is a rank-one perturbation of the nominal. Combining the worst occupation measure with a robust Q-value estimation yields an explicit form of the robust gradient. Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent. Hence, it relieves the computational burden of convex optimization problems required for training robust policies by current policy gradient approaches.

Keywords

policy gradient markov decision processes reinforcement learning

Cite

@article{arxiv.2301.13589,
  title  = {Policy Gradient for Rectangular Robust Markov Decision Processes},
  author = {Navdeep Kumar and Esther Derman and Matthieu Geist and Kfir Levy and Shie Mannor},
  journal= {arXiv preprint arXiv:2301.13589},
  year   = {2023}
}

Comments

Accepted to NeurIPS 2023

Policy Gradient for Rectangular Robust Markov Decision Processes

Abstract

Keywords

Cite

Comments

Related papers