English
Related papers

Related papers: Proximal Gradient Temporal Difference Learning: St…

200 papers

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our…

Machine Learning · Computer Science 2020-07-06 Bo Liu , Ji Liu , Mohammad Ghavamzadeh , Sridhar Mahadevan , Marek Petrik

Off-policy algorithms, in which a behavior policy differs from the target policy and is used to gain experience for learning, have proven to be of great practical value in reinforcement learning. However, even for simple convex problems…

Machine Learning · Computer Science 2022-09-13 Rong J. B. Zhu , James M. Murray

Sutton, Szepesv\'{a}ri and Maei introduced the first gradient temporal-difference (GTD) learning algorithms compatible with both linear function approximation and off-policy training. The goal of this paper is (a) to propose some variants…

Machine Learning · Computer Science 2024-01-23 Donghwan Lee , Han-Dong Lim , Jihoon Park , Okyong Choi

Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy…

Machine Learning · Computer Science 2021-11-23 Rohan Deb , Shalabh Bhatnagar

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the…

Machine Learning · Computer Science 2019-04-04 Chao Qu , Shie Mannor , Huan Xu

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several…

Machine Learning · Computer Science 2018-03-30 Huizhen Yu

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL)…

Optimization and Control · Mathematics 2018-08-23 Donghwan Lee , Hyungjin Yoon , Naira Hovakimyan

We consider large linear and nonlinear fixed point problems, and solution with proximal algorithms. We show that there is a close connection between two seemingly different types of methods from distinct fields: 1) Proximal iterations for…

Numerical Analysis · Computer Science 2019-09-05 Dimitri P. Bertsekas

Stochastic gradient descent (SGD) is a standard optimization method to minimize a training error with respect to network parameters in modern neural network learning. However, it typically suffers from proliferation of saddle points in the…

Machine Learning · Computer Science 2017-11-23 Haiping Huang , Taro Toyoizumi

Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the…

Machine Learning · Computer Science 2026-05-15 Théo Vincent , Kevin Gerhardt , Yogesh Tripathi , Habib Maraqten , Adam White , Martha White , Jan Peters , Carlo D'Eramo

Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function…

Machine Learning · Computer Science 2025-03-03 Xiaochi Qian , Shangtong Zhang

There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training…

Machine Learning · Statistics 2025-10-30 Fabian Schaipp , Guillaume Garrigos , Umut Simsekli , Robert Gower

Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…

Machine Learning · Computer Science 2025-06-24 Hwanwoo Kim , Panos Toulis , Eric Laber

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due…

Machine Learning · Computer Science 2025-04-22 Eric Lu

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to…

Machine Learning · Computer Science 2014-05-28 Sridhar Mahadevan , Bo Liu , Philip Thomas , Will Dabney , Steve Giguere , Nicholas Jacek , Ian Gemp , Ji Liu

In this paper, we study the Temporal Difference (TD) learning with linear value function approximation. It is well known that most TD learning algorithms are unstable with linear function approximation and off-policy learning. Recent…

Artificial Intelligence · Computer Science 2016-10-06 Dominik Meyer , Hao Shen , Klaus Diepold

Temporal difference (TD) learning is a widely used method to evaluate policies in reinforcement learning. While many TD learning methods have been developed in recent years, little attention has been paid to preserving privacy and most of…

Machine Learning · Computer Science 2022-01-26 Canzhe Zhao , Yanjie Ze , Jing Dong , Baoxiang Wang , Shuai Li

Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a…

Machine Learning · Computer Science 2021-05-11 Andrew Jacobsen , Alan Chan
‹ Prev 1 2 3 10 Next ›