English

Error Controlled Actor-Critic

Machine Learning 2021-09-08 v2

Abstract

On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensures confining the approximation error in value function. We present an analysis of how the approximation error can hinder the optimization process of actor-critic methods.Then, we derive an upper boundary of the approximation error of Q function approximator and find that the error can be lowered by restricting on the KL-divergence between every two consecutive policies when training the policy. The results of experiments on a range of continuous control tasks demonstrate that the proposed actor-critic algorithm apparently reduces the approximation error and significantly outperforms other model-free RL algorithms.

Cite

@article{arxiv.2109.02517,
  title  = {Error Controlled Actor-Critic},
  author = {Xingen Gao and Fei Chao and Changle Zhou and Zhen Ge and Chih-Min Lin and Longzhi Yang and Xiang Chang and Changjing Shang},
  journal= {arXiv preprint arXiv:2109.02517},
  year   = {2021}
}