Implicit Temporal Differences

Aviv Tamar; Panos Toulis; Shie Mannor; Edoardo M. Airoldi

Implicit Temporal Differences

Machine Learning 2014-12-23 v1 Machine Learning

Authors: Aviv Tamar , Panos Toulis , Shie Mannor , Edoardo M. Airoldi

Abstract

In reinforcement learning, the TD( $\lambda$ ) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD( $\lambda$ ) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability. In this work, we introduce the implicit TD( $\lambda$ ) algorithm which has the same function and computational cost as TD( $\lambda$ ), but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit TD( $\lambda$ ) on typical benchmark tasks. Our results show that implicit TD( $\lambda$ ) outperforms standard TD( $\lambda$ ) and a state-of-the-art method that automatically tunes the step-size, and thus shows promise for wide applicability.

Keywords

online learning deep learning theoretical bounds and convergence

Cite

@article{arxiv.1412.6734,
  title  = {Implicit Temporal Differences},
  author = {Aviv Tamar and Panos Toulis and Shie Mannor and Edoardo M. Airoldi},
  journal= {arXiv preprint arXiv:1412.6734},
  year   = {2014}
}

Implicit Temporal Differences

Abstract

Keywords

Cite

Related papers