Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Pan Xu; Felicia Gao; Quanquan Gu

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Machine Learning 2021-08-03 v3 Optimization and Control Machine Learning

Authors: Pan Xu , Felicia Gao , Quanquan Gu

Abstract

Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires $O(1/\epsilon^{3/2})$ episodes to find an $\epsilon$ -approximate stationary point of the nonconcave performance function $J(\boldsymbol{\theta})$ (i.e., $\boldsymbol{\theta}$ such that $\|\nabla J(\boldsymbol{\theta})\|_2^2\leq\epsilon$ ). This sample complexity improves the existing result $O(1/\epsilon^{5/3})$ for stochastic variance reduced policy gradient algorithms by a factor of $O(1/\epsilon^{1/6})$ . In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.

Keywords

policy gradient stochastic gradient descent randomized algorithm

Cite

@article{arxiv.1909.08610,
  title  = {Sample Efficient Policy Gradient Methods with Recursive Variance Reduction},
  author = {Pan Xu and Felicia Gao and Quanquan Gu},
  journal= {arXiv preprint arXiv:1909.08610},
  year   = {2021}
}

Comments

23 pages, 2 figures, 3 tables. In ICLR 2020

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Abstract

Keywords

Cite

Comments

Related papers