Related papers: An MRP Formulation for Supervised Learning: Genera…

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn

Bridging the Gap Between Average and Discounted TD Learning

The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This complicates standard analyses of…

Machine Learning · Computer Science 2026-05-05 Haoxing Tian , Zaiwei Chen , Ioannis Ch. Paschalidis , Alex Olshevsky

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL)…

Optimization and Control · Mathematics 2018-08-23 Donghwan Lee , Hyungjin Yoon , Naira Hovakimyan

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Temporal-difference learning with nonlinear function approximation: lazy training and mean field regimes

We discuss the approximation of the value function for infinite-horizon discounted Markov Reward Processes (MRP) with nonlinear functions trained with the Temporal-Difference (TD) learning algorithm. We first consider this problem under a…

Machine Learning · Computer Science 2024-02-05 Andrea Agazzi , Jianfeng Lu

Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

On Generalized Bellman Equations and Temporal-Difference Learning

We consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To…

Machine Learning · Computer Science 2018-11-27 Huizhen Yu , A. Rupam Mahmood , Richard S. Sutton

Differential TD Learning for Value Function Approximation

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A…

Systems and Control · Computer Science 2018-12-27 Adithya M. Devraj , Sean P. Meyn

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic…

Machine Learning · Computer Science 2024-06-05 Aritra Mitra , George J. Pappas , Hamed Hassani

A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted…

Machine Learning · Statistics 2025-05-14 Yang Peng , Kaicheng Jin , Liangyu Zhang , Zhihua Zhang

Robust Real-Time Mortality Prediction in the Intensive Care Unit using Temporal Difference Learning

The task of predicting long-term patient outcomes using supervised machine learning is a challenging one, in part because of the high variance of each patient's trajectory, which can result in the model over-fitting to the training data.…

Machine Learning · Computer Science 2026-02-09 Thomas Frost , Kezhi Li , Steve Harris

Temporal-Difference Networks

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each…

Machine Learning · Computer Science 2015-04-22 Richard S. Sutton , Brian Tanner

Generalized Linear Markov Decision Process

The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics…

Machine Learning · Statistics 2025-06-03 Sinian Zhang , Kaicheng Zhang , Ziping Xu , Tianxi Cai , Doudou Zhou

Finite-Time Accuracy of Temporal-Difference Learning Under Schur-Stable Recursions

Temporal difference (TD) learning is a cornerstone reinforcement learning (RL) method for policy evaluation, where the goal is to estimate the value function of a Markov decision process under a fixed policy. While a substantial body of…

Machine Learning · Computer Science 2026-02-02 Donghwan Lee , Do Wan Kim

Finite-Time Analysis of Temporal Difference Learning with Experience Replay

Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time…

Machine Learning · Computer Science 2025-04-16 Han-Dong Lim , Donghwan Lee

Control Theoretic Analysis of Temporal Difference Learning

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as…

Machine Learning · Computer Science 2026-02-19 Ethan Blaser , Jiuqi Wang , Shangtong Zhang

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Learning sparse representations in reinforcement learning

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle