Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

Long Yang; Yu Zhang; Jun Wen; Qian Zheng; Pengfei Li; Gang Pan

Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

Machine Learning 2019-09-09 v2 Artificial Intelligence Machine Learning

Authors: Long Yang , Yu Zhang , Jun Wen , Qian Zheng , Pengfei Li , Gang Pan

Abstract

Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$ ( $\lambda$ ) and propose a tabular $\mathtt{ES}$ ( $\lambda$ )- $\mathtt{CV}$ algorithm. We prove that if a proper estimator of value function reaches, the proposed $\mathtt{ES}$ ( $\lambda$ )- $\mathtt{CV}$ enjoys a lower variance than $\mathtt{Expected}$ $\mathtt{Sarsa}$ ( $\lambda$ ). Furthermore, to extend $\mathtt{ES}$ ( $\lambda$ )- $\mathtt{CV}$ to be a convergent algorithm with linear function approximation, we propose the $\mathtt{GES}$ ( $\lambda$ ) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of $\mathtt{GES}$ ( $\lambda$ ) achieves $\mathcal{O}(1/T)$ , which matches or outperforms lots of state-of-art gradient-based algorithms, but we use a more relaxed condition. Numerical experiments show that the proposed algorithm performs better with lower variance than several state-of-art gradient-based TD learning algorithms: $\mathtt{GQ}$ ( $\lambda$ ), $\mathtt{GTB}$ ( $\lambda$ ) and $\mathtt{ABQ}$ ( $\zeta$ ).

Keywords

randomized algorithm convex optimization

Cite

@article{arxiv.1906.11058,
  title  = {Expected Sarsa($\lambda$) with Control Variate for Variance Reduction},
  author = {Long Yang and Yu Zhang and Jun Wen and Qian Zheng and Pengfei Li and Gang Pan},
  journal= {arXiv preprint arXiv:1906.11058},
  year   = {2019}
}

Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

Abstract

Keywords

Cite

Related papers