True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide
Machine Learning
2015-07-28 v1
Abstract
This document is a guide to the implementation of true online emphatic TD(), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".
Keywords
Cite
@article{arxiv.1507.07147,
title = {True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide},
author = {Richard S. Sutton},
journal= {arXiv preprint arXiv:1507.07147},
year = {2015}
}