Related papers: Finite Sample Analyses for TD(0) with Function App…

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a…

Machine Learning · Computer Science 2015-09-02 Nathaniel Korda , L. A. Prashanth

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold…

Machine Learning · Computer Science 2025-10-15 Zixuan Xie , Xinyu Liu , Rohan Chandra , Shangtong Zhang

A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections or Strong Convexity

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in the field of reinforcement learning. We are interested in the so-called ``robust''…

Machine Learning · Computer Science 2025-09-26 Wei-Cheng Lee , Francesco Orabona

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Temporal difference (TD) learning with linear function approximation (linear TD) is a classic and powerful prediction algorithm in reinforcement learning. While it is well-understood that linear TD converges almost surely to a unique point,…

Machine Learning · Computer Science 2026-03-25 Jiuqi Wang , Shangtong Zhang

A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted…

Machine Learning · Statistics 2025-05-14 Yang Peng , Kaicheng Jin , Liangyu Zhang , Zhihua Zhang

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized…

Machine Learning · Computer Science 2020-01-31 Jun Sun , Gang Wang , Georgios B. Giannakis , Qinmin Yang , Zaiyue Yang

Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

Temporal Difference Learning with Continuous Time and State in the Stochastic Setting

We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two…

Machine Learning · Computer Science 2023-06-08 Ziad Kobeissi , Francis Bach

Convergence of TD(0) under Polynomial Mixing with Nonlinear Function Approximation

Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis…

Machine Learning · Statistics 2025-05-22 Anupama Sridhar , Alexander Johansen

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal…

Artificial Intelligence · Computer Science 2008-02-03 P. Cichosz

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn

Reducing Sampling Error in Batch Temporal Difference Learning

Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this…

Machine Learning · Computer Science 2020-08-18 Brahma Pavse , Ishan Durugkar , Josiah Hanna , Peter Stone

Differential TD Learning for Value Function Approximation

Value functions arise as a component of algorithms as well as performance metrics in statistics and engineering applications. Computation of the associated Bellman equations is numerically challenging in all but a few special cases. A…

Systems and Control · Computer Science 2018-12-27 Adithya M. Devraj , Sean P. Meyn

Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Statistical Inference for Temporal Difference Learning with Linear Function Approximation

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the…

Machine Learning · Statistics 2026-02-25 Weichen Wu , Gen Li , Yuting Wei , Alessandro Rinaldo

Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample…

Artificial Intelligence · Computer Science 2018-06-06 Gal Dalal , Balazs Szorenyi , Gugan Thoppe , Shie Mannor