Related papers: Primal-Dual Algorithm for Distributed Reinforcemen…

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

A primal-dual perspective for distributed TD-learning

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as…

Machine Learning · Computer Science 2025-05-14 Han-Dong Lim , Donghwan Lee

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD)…

Machine Learning · Computer Science 2020-06-09 Bo Liu , Ian Gemp , Mohammad Ghavamzadeh , Ji Liu , Sridhar Mahadevan , Marek Petrik

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points…

Machine Learning · Computer Science 2025-08-19 Yangchen Pan , Junfeng Wen , Chenjun Xiao , Philip Torr

Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep{bellemare2017distributional}. In the…

Machine Learning · Computer Science 2019-04-04 Chao Qu , Shie Mannor , Huan Xu

A Differential Perspective on Distributional Reinforcement Learning

To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL…

Machine Learning · Computer Science 2026-01-14 Juan Sebastian Rojas , Chi-Guhn Lee

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

In large-scale distributed machine learning, recent works have studied the effects of compressing gradients in stochastic optimization to alleviate the communication bottleneck. These works have collectively revealed that stochastic…

Machine Learning · Computer Science 2024-06-05 Aritra Mitra , George J. Pappas , Hamed Hassani

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information…

Machine Learning · Computer Science 2021-04-20 Milos S. Stankovic , Marko Beko , Srdjan S. Stankovic

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local…

Optimization and Control · Mathematics 2021-11-08 Dongsheng Ding , Xiaohan Wei , Zhuoran Yang , Zhaoran Wang , Mihailo R. Jovanović

A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance

We study reinforcement learning by combining recent advances in regularized linear programming formulations with the classical theory of stochastic approximation. Motivated by the challenge of designing algorithms that leverage off-policy…

Optimization and Control · Mathematics 2026-04-15 Axel Friedrich Wolter , Tobias Sutter

Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $\eta^\pi$ for a given policy $\pi$.…

Machine Learning · Statistics 2025-01-17 Yang Peng , Liangyu Zhang , Zhihua Zhang

Learning Dynamics and Generalization in Reinforcement Learning

Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially discontinuous value function, and generalizing well to new observations. In this paper, we analyze the learning dynamics of temporal…

Machine Learning · Computer Science 2022-06-07 Clare Lyle , Mark Rowland , Will Dabney , Marta Kwiatkowska , Yarin Gal

Learning sparse representations in reinforcement learning

Reinforcement learning (RL) algorithms allow artificial agents to improve their selection of actions to increase rewarding experiences in their environments. Temporal Difference (TD) Learning -- a model-free RL method -- is a leading…

Machine Learning · Computer Science 2019-09-05 Jacob Rafati , David C. Noelle

Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting

Distributional reinforcement learning (RL) is a powerful framework increasingly adopted in safety-critical domains for its ability to optimize risk-sensitive objectives. However, the role of the discount factor is often overlooked, as it is…

Machine Learning · Computer Science 2026-02-05 Mehrdad Moghimi , Anthony Coache , Hyejin Ku

Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as…

Machine Learning · Computer Science 2026-02-19 Ethan Blaser , Jiuqi Wang , Shangtong Zhang

Foundations of Multivariate Distributional Reinforcement Learning

In reinforcement learning (RL), the consideration of multivariate reward signals has led to fundamental advancements in multi-objective decision-making, transfer learning, and representation learning. This work introduces the first…

Machine Learning · Computer Science 2024-09-05 Harley Wiltzer , Jesse Farebrother , Arthur Gretton , Mark Rowland

Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang