Related papers: Per-decision Multi-step Temporal Difference Learni…

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two…

Machine Learning · Computer Science 2019-06-20 Hugo Penedones , Carlos Riquelme , Damien Vincent , Hartmut Maennel , Timothy Mann , Andre Barreto , Sylvain Gelly , Gergely Neu

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Control Theoretic Analysis of Temporal Difference Learning

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the…

Machine Learning · Computer Science 2022-07-18 Yunhao Tang , Mark Rowland , Rémi Munos , Bernardo Ávila Pires , Will Dabney , Marc G. Bellemare

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of…

Machine Learning · Computer Science 2018-09-21 Kristopher De Asis , Brendan Bennett , Richard S. Sutton

Investigating practical linear temporal difference learning

Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new…

Machine Learning · Computer Science 2016-04-01 Adam White , Martha White

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step…

Artificial Intelligence · Computer Science 2016-08-19 Harm van Seijen

Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation

This paper analyzes multi-step temporal difference (TD)-learning algorithms within the ``deadly triad'' scenario, characterized by linear function approximation, off-policy learning, and bootstrapping. In particular, we prove that $n$-step…

Machine Learning · Computer Science 2026-02-24 Han-Dong Lim , Donghwan Lee

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by…

Machine Learning · Computer Science 2024-08-05 Wuhao Wang , Zhiyong Chen , Lepeng Zhang

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

Extending Differential Temporal Difference Methods for Episodic Problems

Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This…

Machine Learning · Computer Science 2026-05-07 Kris De Asis , Mohamed Elsayed , Jiamin He

TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that…

Machine Learning · Computer Science 2018-06-05 Artemij Amiranashvili , Alexey Dosovitskiy , Vladlen Koltun , Thomas Brox

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-06-17 Mingde Zhao

Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a…

Artificial Intelligence · Computer Science 2018-02-15 Craig Sherstan , Brendan Bennett , Kenny Young , Dylan R. Ashley , Adam White , Martha White , Richard S. Sutton

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy…

Machine Learning · Computer Science 2021-11-23 Rohan Deb , Shalabh Bhatnagar

Correcting Momentum in Temporal Difference Learning

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal…

Machine Learning · Computer Science 2021-06-09 Emmanuel Bengio , Joelle Pineau , Doina Precup

Implicit Updates for Average-Reward Temporal Difference Learning

Temporal difference (TD) learning is a cornerstone of reinforcement learning. In the average-reward setting, standard TD($\lambda$) is highly sensitive to the choice of step-size and thus requires careful tuning to maintain numerical…

Machine Learning · Statistics 2025-10-08 Hwanwoo Kim , Dongkyu Derek Cho , Eric Laber