Related papers: Some Simulation Results for Emphatic Temporal-Diff…

Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize

We consider the emphatic temporal-difference (TD) algorithm, ETD($\lambda$), for learning the value functions of stationary policies in a discounted, finite state and action Markov decision process. The ETD($\lambda$) algorithm was recently…

Machine Learning · Computer Science 2017-01-23 Huizhen Yu

A First Empirical Study of Emphatic Temporal Difference Learning

In this paper we present the first empirical study of the emphatic temporal-difference learning algorithm (ETD), comparing it with conventional temporal-difference learning, in particular, with linear TD(0), on on-policy and off-policy…

Artificial Intelligence · Computer Science 2017-05-15 Sina Ghiassian , Banafsheh Rafiee , Richard S. Sutton

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy…

Artificial Intelligence · Computer Science 2019-03-04 Xiang Gu , Sina Ghiassian , Richard S. Sutton

Emphatic Temporal-Difference Learning

Emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by Sutton, Mahmood and White…

Machine Learning · Computer Science 2015-07-07 A. Rupam Mahmood , Huizhen Yu , Martha White , Richard S. Sutton

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved…

Machine Learning · Computer Science 2017-12-29 Huizhen Yu

TD Convergence: An Optimization Perspective

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where…

Machine Learning · Computer Science 2023-11-10 Kavosh Asadi , Shoham Sabach , Yao Liu , Omer Gottesman , Rasool Fakoor

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced \emph{emphatic temporal differences} (ETD) algorithm \citep{SuttonMW15}, which…

Machine Learning · Statistics 2015-11-30 Assaf Hallak , Aviv Tamar , Remi Munos , Shie Mannor

$\ell_1$ Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent…

Artificial Intelligence · Computer Science 2016-10-06 Dominik Meyer , Hao Shen , Klaus Diepold

An Analysis of Quantile Temporal-Difference Learning

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these…

Machine Learning · Computer Science 2024-05-21 Mark Rowland , Rémi Munos , Mohammad Gheshlaghi Azar , Yunhao Tang , Georg Ostrovski , Anna Harutyunyan , Karl Tuyls , Marc G. Bellemare , Will Dabney

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable…

Machine Learning · Computer Science 2022-07-18 Ziwei Guan , Tengyu Xu , Yingbin Liang

Emphatic TD Bellman Operator is a Contraction

Recently, \citet{SuttonMW15} introduced the emphatic temporal differences (ETD) algorithm for off-policy evaluation in Markov decision processes. In this short note, we show that the projected fixed-point equation that underlies ETD…

Machine Learning · Statistics 2015-08-25 Assaf Hallak , Aviv Tamar , Shie Mannor

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of…

Machine Learning · Computer Science 2022-05-12 Shangtong Zhang , Shimon Whiteson

Learning Linear Temporal Properties

We present two novel algorithms for learning formulas in Linear Temporal Logic (LTL) from examples. The first learning algorithm reduces the learning task to a series of satisfiability problems in propositional Boolean logic and produces a…

Logic in Computer Science · Computer Science 2018-10-05 Daniel Neider , Ivan Gavran

A Feature-Based Analysis on the Impact of Set of Constraints for e-Constrained Differential Evolution

Different types of evolutionary algorithms have been developed for constrained continuous optimization. We carry out a feature-based analysis of evolved constrained continuous optimization instances to understand the characteristics of…

Neural and Evolutionary Computing · Computer Science 2015-06-24 Shayan Poursoltan , FranK Neumann

Control Theoretic Analysis of Temporal Difference Learning

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

Regularized Centered Emphatic Temporal Difference Learning

Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on…

Artificial Intelligence · Computer Science 2026-05-07 Xingguo Chen , Chaohui Wu , Jinguo Ye , Chao Li , Shangdong Yang , Guang Yang , Tianyu Liang , Wenhao Wang

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied…

Machine Learning · Computer Science 2026-04-08 Masoud S. Sakha , Rushikesh Kamalapurkar , Sean Meyn