English
Related papers

Related papers: PID Accelerated Temporal Difference Algorithms

200 papers

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of…

Machine Learning · Computer Science 2018-09-21 Kristopher De Asis , Brendan Bennett , Richard S. Sutton

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

We study policy evaluation problems in multi-task reinforcement learning (RL) under a low-rank representation setting. In this setting, we are given $N$ learning tasks where the corresponding value function of these tasks lie in an…

Machine Learning · Computer Science 2025-03-05 Yitao Bai , Sihan Zeng , Justin Romberg , Thinh T. Doan

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these…

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this…

Machine Learning · Computer Science 2023-05-31 Mark Rowland , Yunhao Tang , Clare Lyle , Rémi Munos , Marc G. Bellemare , Will Dabney

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal…

Machine Learning · Computer Science 2019-09-24 Donghwan Lee , Niao He

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied…

Machine Learning · Computer Science 2026-04-08 Masoud S. Sakha , Rushikesh Kamalapurkar , Sean Meyn

Temporal difference (TD) learning is a popular algorithm for policy evaluation in reinforcement learning, but the vanilla TD can substantially suffer from the inherent optimization variance. A variance reduced TD (VRTD) algorithm was…

Machine Learning · Computer Science 2020-01-13 Tengyu Xu , Zhe Wang , Yi Zhou , Yingbin Liang

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent…

Artificial Intelligence · Computer Science 2016-10-06 Dominik Meyer , Hao Shen , Klaus Diepold

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work…

Machine Learning · Computer Science 2020-09-21 Sina Ghiassian , Andrew Patterson , Shivam Garg , Dhawal Gupta , Adam White , Martha White

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address…

Machine Learning · Computer Science 2018-09-10 Kristopher De Asis , Richard S. Sutton

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where…

Machine Learning · Computer Science 2023-11-10 Kavosh Asadi , Shoham Sabach , Yao Liu , Omer Gottesman , Rasool Fakoor

Temporal difference (TD) learning is a policy evaluation in reinforcement learning whose performance can be enhanced by variance reduction methods. Recently, multiple works have sought to fuse TD learning with Stochastic Variance Reduced…

Machine Learning · Computer Science 2024-08-07 Arsenii Mustafin , Alex Olshevsky , Ioannis Ch. Paschalidis

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn
‹ Prev 1 2 3 10 Next ›