English

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

Machine Learning 2015-07-28 v1

Abstract

This document is a guide to the implementation of true online emphatic TD(λ\lambda), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".

Keywords

Cite

@article{arxiv.1507.07147,
  title  = {True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide},
  author = {Richard S. Sutton},
  journal= {arXiv preprint arXiv:1507.07147},
  year   = {2015}
}
R2 v1 2026-06-22T10:18:42.132Z