Control Regularization for Reduced Variance Reinforcement Learning

Richard Cheng; Abhinav Verma; Gabor Orosz; Swarat Chaudhuri; Yisong Yue; Joel W. Burdick

Control Regularization for Reduced Variance Reinforcement Learning

Machine Learning 2019-05-15 v1 Systems and Control Machine Learning

Authors: Richard Cheng , Abhinav Verma , Gabor Orosz , Swarat Chaudhuri , Yisong Yue , Joel W. Burdick

Abstract

Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a policy prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the policy prior has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.

Keywords

regularization reinforcement learning reinforcement learning from human feedback

Cite

@article{arxiv.1905.05380,
  title  = {Control Regularization for Reduced Variance Reinforcement Learning},
  author = {Richard Cheng and Abhinav Verma and Gabor Orosz and Swarat Chaudhuri and Yisong Yue and Joel W. Burdick},
  journal= {arXiv preprint arXiv:1905.05380},
  year   = {2019}
}

Comments

Appearing in ICML 2019

Control Regularization for Reduced Variance Reinforcement Learning

Abstract

Keywords

Cite

Comments

Related papers