Related papers: True Online Temporal-Difference Learning

An Empirical Evaluation of True Online TD({\lambda})

The true online TD({\lambda}) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD({\lambda}) algorithm, in temporal-difference learning and reinforcement learning. True online…

Artificial Intelligence · Computer Science 2015-07-03 Harm van Seijen , A. Rupam Mahmood , Patrick M. Pilarski , Richard S. Sutton

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q($\sigma$) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the…

Artificial Intelligence · Computer Science 2017-11-07 Markus Dumke

True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide

This document is a guide to the implementation of true online emphatic TD($\lambda$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and…

Machine Learning · Computer Science 2015-07-28 Richard S. Sutton

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Implicit Temporal Differences

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($\lambda$) is its sensitivity…

Machine Learning · Statistics 2014-12-23 Aviv Tamar , Panos Toulis , Shie Mannor , Edoardo M. Airoldi

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key…

Artificial Intelligence · Computer Science 2016-10-25 Martha White , Adam White

True Online TD-Replan(lambda) Achieving Planning through Replaying

In this paper, we develop a new planning method that extends the capabilities of the true online TD to allow an agent to efficiently replay all or part of its past experience, online in the sequence that they appear with, either in each…

Machine Learning · Computer Science 2025-02-03 Abdulrahman Altahhan

Implicit Updates for Average-Reward Temporal Difference Learning

Temporal difference (TD) learning is a cornerstone of reinforcement learning. In the average-reward setting, standard TD($\lambda$) is highly sensitive to the choice of step-size and thus requires careful tuning to maintain numerical…

Machine Learning · Statistics 2025-10-08 Hwanwoo Kim , Dongkyu Derek Cho , Eric Laber

TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

Our understanding of reinforcement learning (RL) has been shaped by theoretical and empirical results that were obtained decades ago using tabular representations and linear function approximators. These results suggest that RL methods that…

Machine Learning · Computer Science 2018-06-05 Artemij Amiranashvili , Alexey Dosovitskiy , Vladlen Koltun , Thomas Brox

Source Traces for Temporal Difference Learning

This paper motivates and develops source traces for temporal difference (TD) learning in the tabular setting. Source traces are like eligibility traces, but model potential histories rather than immediate ones. This allows TD errors to be…

Machine Learning · Computer Science 2019-02-11 Silviu Pitis

Temporal-Differential Learning in Continuous Environments

In this paper, a new reinforcement learning (RL) method known as the method of temporal differential is introduced. Compared to the traditional temporal-difference learning method, it plays a crucial role in developing novel RL techniques…

Machine Learning · Computer Science 2020-06-02 Tao Bian , Zhong-Ping Jiang

Control Theoretic Analysis of Temporal Difference Learning

The goal of this manuscript is to conduct a controltheoretic analysis of Temporal Difference (TD) learning algorithms. TD-learning serves as a cornerstone in the realm of reinforcement learning, offering a methodology for approximating the…

Artificial Intelligence · Computer Science 2023-09-12 Donghwan Lee , Do Wan Kim

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

On the Statistical Benefits of Temporal Difference Learning

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by…

Machine Learning · Computer Science 2024-02-15 David Cheikhi , Daniel Russo

Learning sparse representations in reinforcement learning

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle

Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation

Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step…

Artificial Intelligence · Computer Science 2016-08-19 Harm van Seijen

Segmenting Action-Value Functions Over Time-Scales in SARSA via TD($\Delta$)

In numerous episodic reinforcement learning (RL) environments, SARSA-based methodologies are employed to enhance policies aimed at maximizing returns over long horizons. Traditional SARSA algorithms face challenges in achieving an optimal…

Machine Learning · Computer Science 2025-09-05 Mahammad Humayoo

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-06-17 Mingde Zhao