Related papers: Finite-Time Performance of Distributed Temporal Di…

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized…

Machine Learning · Computer Science 2020-01-31 Jun Sun , Gang Wang , Georgios B. Giannakis , Qinmin Yang , Zaiyue Yang

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

Federated Temporal Difference Learning with Linear Function Approximation under Environmental Heterogeneity

We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space…

Machine Learning · Computer Science 2024-07-02 Han Wang , Aritra Mitra , Hamed Hassani , George J. Pappas , James Anderson

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL)…

Optimization and Control · Mathematics 2018-08-23 Donghwan Lee , Hyungjin Yoon , Naira Hovakimyan

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local…

Optimization and Control · Mathematics 2021-11-08 Dongsheng Ding , Xiaohan Wei , Zhuoran Yang , Zhaoran Wang , Mihailo R. Jovanović

Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a…

Machine Learning · Computer Science 2025-10-10 Ankur Naskar , Gugan Thoppe , Utsav Negi , Vijay Gupta

Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

Finite-Time Analysis of Asynchronous Multi-Agent TD Learning

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear…

Multiagent Systems · Computer Science 2024-07-31 Nicolò Dal Fabbro , Arman Adibi , Aritra Mitra , George J. Pappas

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information…

Machine Learning · Computer Science 2021-04-20 Milos S. Stankovic , Marko Beko , Srdjan S. Stankovic

Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits

One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm…

Machine Learning · Computer Science 2025-02-10 Sreejeet Maity , Aritra Mitra

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in…

Machine Learning · Computer Science 2020-10-29 Rui Liu , Alex Olshevsky

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable,…

Machine Learning · Computer Science 2024-11-18 Cameron Allen , Aaron Kirtland , Ruo Yu Tao , Sam Lobel , Daniel Scott , Nicholas Petrocelli , Omer Gottesman , Ronald Parr , Michael L. Littman , George Konidaris

One-Shot Averaging for Distributed TD($\lambda$) Under Markov Sampling

We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this…

Machine Learning · Computer Science 2024-06-04 Haoxing Tian , Ioannis Ch. Paschalidis , Alex Olshevsky

Towards Parameter-Free Temporal Difference Learning

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

Federated Stochastic Approximation under Markov Noise and Heterogeneity: Applications in Reinforcement Learning

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central…

Machine Learning · Computer Science 2024-10-22 Sajad Khodadadian , Pranay Sharma , Gauri Joshi , Siva Theja Maguluri

Finite-Time Error Bounds for Distributed Linear Stochastic Approximation

This paper considers a novel multi-agent linear stochastic approximation algorithm driven by Markovian noise and general consensus-type interaction, in which each agent evolves according to its local stochastic approximation process which…

Machine Learning · Computer Science 2023-10-02 Yixuan Lin , Vijay Gupta , Ji Liu

Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning

TD($\lambda$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic…

Machine Learning · Computer Science 2026-05-13 Yue Deng , Zirui Wang , Yin Zhang