Related papers: Implicit Temporal Differences

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Implicit Updates for Average-Reward Temporal Difference Learning

Temporal difference (TD) learning is a cornerstone of reinforcement learning. In the average-reward setting, standard TD($\lambda$) is highly sensitive to the choice of step-size and thus requires careful tuning to maintain numerical…

Machine Learning · Statistics 2025-10-08 Hwanwoo Kim , Dongkyu Derek Cho , Eric Laber

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that…

Machine Learning · Computer Science 2016-07-21 Richard S. Sutton , A. Rupam Mahmood , Martha White

Schedule Based Temporal Difference Algorithms

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD($\lambda$) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$-step…

Machine Learning · Computer Science 2021-11-24 Rohan Deb , Meet Gandhi , Shalabh Bhatnagar

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

An Empirical Evaluation of True Online TD({\lambda})

The true online TD({\lambda}) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD({\lambda}) algorithm, in temporal-difference learning and reinforcement learning. True online…

Artificial Intelligence · Computer Science 2015-07-03 Harm van Seijen , A. Rupam Mahmood , Patrick M. Pilarski , Richard S. Sutton

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy…

Machine Learning · Computer Science 2021-11-23 Rohan Deb , Shalabh Bhatnagar

Stable and Efficient Policy Evaluation

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled:…

Machine Learning · Computer Science 2021-12-30 Daoming Lyu , Bo Liu , Matthieu Geist , Wen Dong , Saad Biaz , Qi Wang

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently…

Machine Learning · Computer Science 2017-01-23 Huizhen Yu

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

This document is a guide to the implementation of true online emphatic TD($\lambda$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and…

Machine Learning · Computer Science 2015-07-28 Richard S. Sutton

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size…

Machine Learning · Computer Science 2018-04-11 Alex Kearney , Vivek Veeriah , Jaden B. Travnik , Richard S. Sutton , Patrick M. Pilarski

Reinforcement Learning From State and Temporal Differences

TD($\lambda$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($\lambda$) has been shown to minimise the squared error between the approximate value…

Machine Learning · Computer Science 2025-12-24 Lex Weaver , Jonathan Baxter

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key…

Artificial Intelligence · Computer Science 2016-10-25 Martha White , Adam White

Incremental Truncated LSTD

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the…

Machine Learning · Computer Science 2016-11-21 Clement Gehring , Yangchen Pan , Martha White

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold…

Machine Learning · Computer Science 2025-10-15 Zixuan Xie , Xinyu Liu , Rohan Chandra , Shangtong Zhang