Related papers: PER-ETD: A Polynomially Efficient Emphatic Tempora…

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy…

Artificial Intelligence · Computer Science 2019-03-04 Xiang Gu , Sina Ghiassian , Richard S. Sutton

A First Empirical Study of Emphatic Temporal Difference Learning

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy…

Artificial Intelligence · Computer Science 2017-05-15 Sina Ghiassian , Banafsheh Rafiee , Richard S. Sutton

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

Regularized Centered Emphatic Temporal Difference Learning

Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on…

Artificial Intelligence · Computer Science 2026-05-07 Xingguo Chen , Chaohui Wu , Jinguo Ye , Chao Li , Shangdong Yang , Guang Yang , Tianyu Liang , Wenhao Wang

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that…

Machine Learning · Computer Science 2016-07-21 Richard S. Sutton , A. Rupam Mahmood , Martha White

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \emph{emphatic temporal differences} (ETD) algorithm \citep{SuttonMW15}, which…

Machine Learning · Statistics 2015-11-30 Assaf Hallak , Aviv Tamar , Remi Munos , Shie Mannor

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently…

Machine Learning · Computer Science 2017-01-23 Huizhen Yu

Emphatic Temporal-Difference Learning

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White…

Machine Learning · Computer Science 2015-07-07 A. Rupam Mahmood , Huizhen Yu , Martha White , Richard S. Sutton

Learning Expected Emphatic Traces for Deep RL

Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as…

Machine Learning · Computer Science 2021-07-13 Ray Jiang , Shangtong Zhang , Veronica Chelu , Adam White , Hado van Hasselt

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved…

Machine Learning · Computer Science 2017-12-29 Huizhen Yu

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-06-17 Mingde Zhao

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-05-19 Mingde Zhao , Sitao Luan , Ian Porada , Xiao-Wen Chang , Doina Precup

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

Episodic Novelty Through Temporal Distance

Exploration in sparse reward environments remains a significant challenge in reinforcement learning, particularly in Contextual Markov Decision Processes (CMDPs), where environments differ across episodes. Existing episodic intrinsic…

Machine Learning · Computer Science 2025-01-28 Yuhua Jiang , Qihan Liu , Yiqin Yang , Xiaoteng Ma , Dianyu Zhong , Hao Hu , Jun Yang , Bin Liang , Bo Xu , Chongjie Zhang , Qianchuan Zhao

Emphatic TD Bellman Operator is a Contraction

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD…

Machine Learning · Statistics 2015-08-25 Assaf Hallak , Aviv Tamar , Shie Mannor

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Preferential Temporal Difference Learning

Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are…

Machine Learning · Computer Science 2021-08-24 Nishanth Anand , Doina Precup

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of…

Machine Learning · Computer Science 2022-05-12 Shangtong Zhang , Shimon Whiteson

Some Simulation Results for Emphatic Temporal-Difference Learning Algorithms

This is a companion note to our recent study of the weak convergence properties of constrained emphatic temporal-difference learning (ETD) algorithms from a theoretic perspective. It supplements the latter analysis with simulation results…

Machine Learning · Computer Science 2016-05-10 Huizhen Yu