Related papers: Reinforcement Learning From State and Temporal Dif…

Predictive State Temporal Difference Learning

We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that…

Machine Learning · Computer Science 2015-03-17 Byron Boots , Geoffrey J. Gordon

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that…

Machine Learning · Computer Science 2016-07-21 Richard S. Sutton , A. Rupam Mahmood , Martha White

Discerning Temporal Difference Learning

Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction…

Machine Learning · Computer Science 2024-02-13 Jianfei Ma

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation…

Machine Learning · Computer Science 2021-06-23 Ray Jiang , Tom Zahavy , Zhongwen Xu , Adam White , Matteo Hessel , Charles Blundell , Hado van Hasselt

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable,…

Machine Learning · Computer Science 2024-11-18 Cameron Allen , Aaron Kirtland , Ruo Yu Tao , Sam Lobel , Daniel Scott , Nicholas Petrocelli , Omer Gottesman , Ronald Parr , Michael L. Littman , George Konidaris

Implicit Temporal Differences

In reinforcement learning, the TD($\lambda$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($\lambda$) is its sensitivity…

Machine Learning · Statistics 2014-12-23 Aviv Tamar , Panos Toulis , Shie Mannor , Edoardo M. Airoldi

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold…

Machine Learning · Computer Science 2025-10-15 Zixuan Xie , Xinyu Liu , Rohan Chandra , Shangtong Zhang

Multi-State TD Target for Model-Free Reinforcement Learning

Temporal difference (TD) learning is a fundamental technique in reinforcement learning that updates value estimates for states or state-action pairs using a TD target. This target represents an improved estimate of the true value by…

Machine Learning · Computer Science 2024-08-05 Wuhao Wang , Zhiyong Chen , Lepeng Zhang

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence…

Machine Learning · Computer Science 2025-04-21 Han-Dong Lim , Donghwan Lee

On Convergence of Emphatic Temporal-Difference Learning

We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved…

Machine Learning · Computer Science 2017-12-29 Huizhen Yu

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to extract task-relevant information from complex observations, balancing between robustness to distraction and informativeness to the policy. Such stable and rich…

Machine Learning · Computer Science 2021-10-28 Mete Kemertas , Tristan Aumentado-Armstrong

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two…

Machine Learning · Computer Science 2019-06-20 Hugo Penedones , Carlos Riquelme , Damien Vincent , Hartmut Maennel , Timothy Mann , Andre Barreto , Sylvain Gelly , Gergely Neu

Approximate Temporal Difference Learning is a Gradient Descent for Reversible Policies

In reinforcement learning, temporal difference (TD) is the most direct algorithm to learn the value function of a policy. For large or infinite state spaces, exact representations of the value function are usually not available, and it must…

Machine Learning · Computer Science 2018-05-03 Yann Ollivier

Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of…

Machine Learning · Computer Science 2018-05-28 Haifang Li , Yingce Xia , Wensheng Zhang

Preferential Temporal Difference Learning

Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are…

Machine Learning · Computer Science 2021-08-24 Nishanth Anand , Doina Precup

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Representations for Stable Off-Policy Reinforcement Learning

Reinforcement learning with function approximation can be unstable and even divergent, especially when combined with off-policy learning and Bellman updates. In deep reinforcement learning, these issues have been dealt with empirically by…

Machine Learning · Computer Science 2020-10-06 Dibya Ghosh , Marc G. Bellemare

A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks

Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon and the inverted helicopter flight. However, little effort has been put into developing methods to learn policies for complex persistent tasks…

Artificial Intelligence · Computer Science 2016-06-22 Xiao Li , Calin Belta