Related papers: Differential TD Learning for Value Function Approx…

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Bridging the Gap Between Average and Discounted TD Learning

The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This complicates standard analyses of…

Machine Learning · Computer Science 2026-05-05 Haoxing Tian , Zaiwei Chen , Ioannis Ch. Paschalidis , Alex Olshevsky

Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in…

Machine Learning · Computer Science 2020-10-29 Rui Liu , Alex Olshevsky

$\ell_1$ Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent…

Artificial Intelligence · Computer Science 2016-10-06 Dominik Meyer , Hao Shen , Klaus Diepold

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Incrementally Learning Functions of the Return

Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems. Standard value…

Machine Learning · Computer Science 2019-07-11 Brendan Bennett , Wesley Chung , Muhammad Zaheer , Vincent Liu

On the Statistical Benefits of Temporal Difference Learning

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by…

Machine Learning · Computer Science 2024-02-15 David Cheikhi , Daniel Russo

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true…

Machine Learning · Computer Science 2017-04-21 Bo Liu , Daoming Lyu , Wen Dong , Saad Biaz

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of…

Machine Learning · Computer Science 2018-09-21 Kristopher De Asis , Brendan Bennett , Richard S. Sutton

Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied…

Machine Learning · Computer Science 2026-04-08 Masoud S. Sakha , Rushikesh Kamalapurkar , Sean Meyn

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the…

Machine Learning · Statistics 2026-02-25 Weichen Wu , Gen Li , Yuting Wei , Alessandro Rinaldo

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several…

Machine Learning · Computer Science 2018-03-30 Huizhen Yu

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted…

Machine Learning · Statistics 2025-05-14 Yang Peng , Kaicheng Jin , Liangyu Zhang , Zhihua Zhang