English
Related papers

Related papers: Predicting Periodicity with Temporal Difference Le…

200 papers

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We…

Machine Learning · Computer Science 2023-12-20 Nishanth Anand , Doina Precup

Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are…

Machine Learning · Computer Science 2021-08-24 Nishanth Anand , Doina Precup

In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long…

Machine Learning · Computer Science 2019-05-28 Joshua Romoff , Peter Henderson , Ahmed Touati , Emma Brunskill , Joelle Pineau , Yann Ollivier

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by…

Machine Learning · Computer Science 2024-02-15 David Cheikhi , Daniel Russo

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in…

Machine Learning · Computer Science 2020-10-29 Rui Liu , Alex Olshevsky

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two…

Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This…

Machine Learning · Computer Science 2026-05-07 Kris De Asis , Mohamed Elsayed , Jiamin He

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

Multi-step temporal difference (TD) learning is an important approach in reinforcement learning, as it unifies one-step TD learning with Monte Carlo methods in a way where intermediate algorithms can outperform either extreme. They address…

Machine Learning · Computer Science 2018-09-10 Kristopher De Asis , Richard S. Sutton

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this…

Machine Learning · Computer Science 2023-05-31 Mark Rowland , Yunhao Tang , Clare Lyle , Rémi Munos , Marc G. Bellemare , Will Dabney

Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems. Standard value…

Machine Learning · Computer Science 2019-07-11 Brendan Bennett , Wesley Chung , Muhammad Zaheer , Vincent Liu

Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied…

Machine Learning · Computer Science 2026-04-08 Masoud S. Sakha , Rushikesh Kamalapurkar , Sean Meyn

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a…

Artificial Intelligence · Computer Science 2018-02-15 Craig Sherstan , Brendan Bennett , Kenny Young , Dylan R. Ashley , Adam White , Martha White , Richard S. Sutton

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by…

Machine Learning · Computer Science 2024-08-05 Wuhao Wang , Zhiyong Chen , Lepeng Zhang

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A…

Systems and Control · Computer Science 2018-12-27 Adithya M. Devraj , Sean P. Meyn
‹ Prev 1 2 3 10 Next ›