English

Model-Augmented Q-learning

Machine Learning 2021-02-09 v1

Abstract

In recent years, QQ-learning has become indispensable for model-free reinforcement learning (MFRL). However, it suffers from well-known problems such as under- and overestimation bias of the value, which may adversely affect the policy learning. To resolve this issue, we propose a MFRL framework that is augmented with the components of model-based RL. Specifically, we propose to estimate not only the QQ-values but also both the transition and the reward with a shared network. We further utilize the estimated reward from the model estimators for QQ-learning, which promotes interaction between the estimators. We show that the proposed scheme, called Model-augmented QQ-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward. Finally, we also provide a trick to prioritize past experiences in the replay buffer by utilizing model-estimation errors. We experimentally validate MQL built upon state-of-the-art off-policy MFRL methods, and show that MQL largely improves their performance and convergence. The proposed scheme is simple to implement and does not require additional training cost.

Keywords

Cite

@article{arxiv.2102.03866,
  title  = {Model-Augmented Q-learning},
  author = {Youngmin Oh and Jinwoo Shin and Eunho Yang and Sung Ju Hwang},
  journal= {arXiv preprint arXiv:2102.03866},
  year   = {2021}
}