English

Accelerated Gradient Temporal Difference Learning

Artificial Intelligence 2017-03-13 v2 Machine Learning Machine Learning

Abstract

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.

Keywords

Cite

@article{arxiv.1611.09328,
  title  = {Accelerated Gradient Temporal Difference Learning},
  author = {Yangchen Pan and Adam White and Martha White},
  journal= {arXiv preprint arXiv:1611.09328},
  year   = {2017}
}

Comments

AAAI Conference on Artificial Intelligence (AAAI), 2017