Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka; Takahisa Imagawa; Voot Tangkaratt; Takayuki Osa; Takashi Onishi; Yoshimasa Tsuruoka

Meta-Model-Based Meta-Policy Optimization

Machine Learning 2021-10-12 v5 Machine Learning

Authors: Takuya Hiraoka , Takahisa Imagawa , Voot Tangkaratt , Takayuki Osa , Takashi Onishi , Yoshimasa Tsuruoka

Abstract

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

Keywords

policy gradient reinforcement learning hyperparameter optimization

Cite

@article{arxiv.2006.02608,
  title  = {Meta-Model-Based Meta-Policy Optimization},
  author = {Takuya Hiraoka and Takahisa Imagawa and Voot Tangkaratt and Takayuki Osa and Takashi Onishi and Yoshimasa Tsuruoka},
  journal= {arXiv preprint arXiv:2006.02608},
  year   = {2021}
}

Comments

ACML 2021. Video demo: https://drive.google.com/file/d/1DRA-pmIWnHGNv5G_gFrml8YzKCtMcGnu/view?usp=sharing URL Source code: https://github.com/TakuyaHiraoka/Meta-Model-Based-Meta-Policy-Optimization

Meta-Model-Based Meta-Policy Optimization

Abstract

Keywords

Cite

Comments

Related papers