Related papers: TDprop: Does Jacobi Preconditioning Help Temporal …

Temporal Difference Updating without a Learning Rate

We derive an equation for temporal difference learning from statistical principles. Specifically, we start with the variational principle and then bootstrap to produce an updating rule for discounted state value estimates. The resulting…

Machine Learning · Computer Science 2008-11-03 Marcus Hutter , Shane Legg

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the…

Machine Learning · Computer Science 2015-09-01 Yann N. Dauphin , Harm de Vries , Yoshua Bengio

Gradient Iterated Temporal-Difference Learning

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size…

Machine Learning · Computer Science 2018-04-11 Alex Kearney , Vivek Veeriah , Jaden B. Travnik , Richard S. Sutton , Patrick M. Pilarski

META-Learning Eligibility Traces for More Sample Efficient Temporal Difference Learning

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-06-17 Mingde Zhao

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key…

Artificial Intelligence · Computer Science 2016-10-25 Martha White , Adam White

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that…

Machine Learning · Computer Science 2016-07-21 Richard S. Sutton , A. Rupam Mahmood , Martha White

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Improving Adaptive Moment Optimization via Preconditioner Diagonalization

Modern adaptive optimization methods, such as Adam and its variants, have emerged as the most widely used tools in deep learning over recent years. These algorithms offer automatic mechanisms for dynamically adjusting the update step based…

Machine Learning · Computer Science 2025-02-12 Son Nguyen , Bo Liu , Lizhang Chen , Qiang Liu

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies.…

Machine Learning · Computer Science 2020-05-19 Mingde Zhao , Sitao Luan , Ian Porada , Xiao-Wen Chang , Doina Precup

Correcting Momentum in Temporal Difference Learning

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal…

Machine Learning · Computer Science 2021-06-09 Emmanuel Bengio , Joelle Pineau , Doina Precup

Predicting Periodicity with Temporal Difference Learning

Temporal difference (TD) learning is an important approach in reinforcement learning, as it combines ideas from dynamic programming and Monte Carlo methods in a way that allows for online and incremental model-free learning. A key idea of…

Machine Learning · Computer Science 2018-09-21 Kristopher De Asis , Brendan Bennett , Richard S. Sutton

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the…

Machine Learning · Computer Science 2019-11-06 Bin Hu , Usman Ahmed Syed

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic…

Machine Learning · Computer Science 2024-06-05 Aritra Mitra , George J. Pappas , Hamed Hassani

Cumulative Learning Rate Adaptation: Revisiting Path-Based Schedules for SGD and Adam

The learning rate is a crucial hyperparameter in deep learning, with its ideal value depending on the problem and potentially changing during training. In this paper, we investigate the practical utility of adaptive learning rate mechanisms…

Machine Learning · Computer Science 2025-08-08 Asma Atamna , Tom Maus , Fabian Kievelitz , Tobias Glasmachers

TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning

Latent prediction--where agents learn by predicting their own latents--has emerged as a powerful paradigm for training general representations in machine learning. In reinforcement learning (RL), this approach has been explored to define…

Machine Learning · Computer Science 2025-10-02 Marco Bagatella , Matteo Pirotta , Ahmed Touati , Alessandro Lazaric , Andrea Tirinzoni

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani