Adaptive Lambda Least-Squares Temporal Difference Learning

Timothy A. Mann; Hugo Penedones; Shie Mannor; Todd Hester

Adaptive Lambda Least-Squares Temporal Difference Learning

Machine Learning 2017-01-02 v1 Artificial Intelligence Machine Learning

Authors: Timothy A. Mann , Hugo Penedones , Shie Mannor , Todd Hester

Abstract

Temporal Difference learning or TD( $\lambda$ ) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $\lambda$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $\lambda$ selection problem as a bias-variance trade-off where the solution is the value of $\lambda$ that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of $\lambda$ values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune $\lambda$ and apply function optimization methods to efficiently search the space of $\lambda$ values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally faster than the na\"{i}ve LOTO-CV implementation while achieving similar performance.

Keywords

randomized algorithm sequence alignment optimization

Cite

@article{arxiv.1612.09465,
  title  = {Adaptive Lambda Least-Squares Temporal Difference Learning},
  author = {Timothy A. Mann and Hugo Penedones and Shie Mannor and Todd Hester},
  journal= {arXiv preprint arXiv:1612.09465},
  year   = {2017}
}

Adaptive Lambda Least-Squares Temporal Difference Learning

Abstract

Keywords

Cite

Related papers