True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

Richard S. Sutton

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

Machine Learning 2015-07-28 v1

Authors: Richard S. Sutton

Abstract

This document is a guide to the implementation of true online emphatic TD( $\lambda$ ), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".

Keywords

machine learning sequence alignment

Cite

@article{arxiv.1507.07147,
  title  = {True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide},
  author = {Richard S. Sutton},
  journal= {arXiv preprint arXiv:1507.07147},
  year   = {2015}
}

Related papers

View all related →

Artificial Intelligence · Computer Science

An Empirical Evaluation of True Online TD({\lambda})

Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton