Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning
Machine Learning
2019-09-09 v1 Machine Learning
Abstract
Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q is the first approach unifies them with eligibility trace through the sampling degree . However, it is limited to the tabular case, for large-scale learning, the Q is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ that extends tabular Q with linear function approximation. We prove the convergence of GQ. Empirical results on some standard domains show that GQ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
Cite
@article{arxiv.1909.02877,
title = {Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning},
author = {Long Yang and Yu Zhang and Qian Zheng and Pengfei Li and Gang Pan},
journal= {arXiv preprint arXiv:1909.02877},
year = {2019}
}