Sample Efficient Policy Gradient Methods with Recursive Variance Reduction
Abstract
Improving the sample efficiency in reinforcement learning has been a long-standing research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which only requires episodes to find an -approximate stationary point of the nonconcave performance function (i.e., such that ). This sample complexity improves the existing result for stochastic variance reduced policy gradient algorithms by a factor of . In addition, we also propose a variant of SRVR-PG with parameter exploration, which explores the initial policy parameter from a prior probability distribution. We conduct numerical experiments on classic control problems in reinforcement learning to validate the performance of our proposed algorithms.
Cite
@article{arxiv.1909.08610,
title = {Sample Efficient Policy Gradient Methods with Recursive Variance Reduction},
author = {Pan Xu and Felicia Gao and Quanquan Gu},
journal= {arXiv preprint arXiv:1909.08610},
year = {2021}
}
Comments
23 pages, 2 figures, 3 tables. In ICLR 2020