Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Long Yang; Yu Zhang; Qian Zheng; Pengfei Li; Gang Pan

Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Machine Learning 2019-09-09 v1 Machine Learning

Authors: Long Yang , Yu Zhang , Qian Zheng , Pengfei Li , Gang Pan

Abstract

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q $(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$ . However, it is limited to the tabular case, for large-scale learning, the Q $(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ $(\sigma,\lambda)$ that extends tabular Q $(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ $(\sigma,\lambda)$ . Empirical results on some standard domains show that GQ $(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

Keywords

randomized algorithm sequence alignment reinforcement learning

Cite

@article{arxiv.1909.02877,
  title  = {Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning},
  author = {Long Yang and Yu Zhang and Qian Zheng and Pengfei Li and Gang Pan},
  journal= {arXiv preprint arXiv:1909.02877},
  year   = {2019}
}

Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Abstract

Keywords

Cite

Related papers