Related papers: Regularized Centered Emphatic Temporal Difference …

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy…

Artificial Intelligence · Computer Science 2019-03-04 Xiang Gu , Sina Ghiassian , Richard S. Sutton

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

Emphatic TD Bellman Operator is a Contraction

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD…

Machine Learning · Statistics 2015-08-25 Assaf Hallak , Aviv Tamar , Shie Mannor

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \emph{emphatic temporal differences} (ETD) algorithm \citep{SuttonMW15}, which…

Machine Learning · Statistics 2015-11-30 Assaf Hallak , Aviv Tamar , Remi Munos , Shie Mannor

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable…

Machine Learning · Computer Science 2022-07-18 Ziwei Guan , Tengyu Xu , Yingbin Liang

A First Empirical Study of Emphatic Temporal Difference Learning

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy…

Artificial Intelligence · Computer Science 2017-05-15 Sina Ghiassian , Banafsheh Rafiee , Richard S. Sutton

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved…

Machine Learning · Computer Science 2017-12-29 Huizhen Yu

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that…

Machine Learning · Computer Science 2016-07-21 Richard S. Sutton , A. Rupam Mahmood , Martha White

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of…

Machine Learning · Computer Science 2022-05-12 Shangtong Zhang , Shimon Whiteson

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently…

Machine Learning · Computer Science 2017-01-23 Huizhen Yu

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

Temporal-difference learning with function approximation can be unstable under off-policy sampling. TDC stabilizes off-policy TD through an auxiliary covariance correction, and TDRC further regularizes this correction in a single-timescale…

Artificial Intelligence · Computer Science 2026-05-29 Xingguo Chen , Zhiang He , Yuchen Shen , Shangdong Yang , Chao Li , Guang Yang , Wenhao Wang

On Generalized Bellman Equations and Temporal-Difference Learning

We consider off-policy temporal-difference (TD) learning in discounted Markov decision processes, where the goal is to evaluate a policy in a model-free way by using observations of a state process generated without executing the policy. To…

Machine Learning · Computer Science 2018-11-27 Huizhen Yu , A. Rupam Mahmood , Richard S. Sutton

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice…

Machine Learning · Computer Science 2024-09-20 Gandharv Patil , Prashanth L. A. , Dheeraj Nagaraj , Doina Precup

$\ell_1$ Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent…

Artificial Intelligence · Computer Science 2016-10-06 Dominik Meyer , Hao Shen , Klaus Diepold

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

In temporal difference (TD) learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation (including…

Machine Learning · Computer Science 2021-06-25 Zaiwei Chen , Siva Theja Maguluri , Sanjay Shakkottai , Karthikeyan Shanmugam

Bridging the Gap Between Average and Discounted TD Learning

The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This complicates standard analyses of…

Machine Learning · Computer Science 2026-05-05 Haoxing Tian , Zaiwei Chen , Ioannis Ch. Paschalidis , Alex Olshevsky

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as…

Machine Learning · Computer Science 2021-07-13 Ray Jiang , Shangtong Zhang , Veronica Chelu , Adam White , Hado van Hasselt