English
Related papers

Related papers: Finite-Time Performance of Distributed Temporal Di…

200 papers

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…

Machine Learning · Computer Science 2018-11-07 Jalaj Bhandari , Daniel Russo , Raghav Singal

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized…

Machine Learning · Computer Science 2020-01-31 Jun Sun , Gang Wang , Georgios B. Giannakis , Qinmin Yang , Zaiyue Yang

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

We initiate the study of federated reinforcement learning under environmental heterogeneity by considering a policy evaluation problem. Our setup involves $N$ agents interacting with environments that share the same state and action space…

Machine Learning · Computer Science 2024-07-02 Han Wang , Aritra Mitra , Hamed Hassani , George J. Pappas , James Anderson

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL)…

Optimization and Control · Mathematics 2018-08-23 Donghwan Lee , Hyungjin Yoon , Naira Hovakimyan

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local…

Optimization and Control · Mathematics 2021-11-08 Dongsheng Ding , Xiaohan Wei , Zhuoran Yang , Zhaoran Wang , Mihailo R. Jovanović

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a…

Machine Learning · Computer Science 2025-10-10 Ankur Naskar , Gugan Thoppe , Utsav Negi , Vijay Gupta

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…

Machine Learning · Statistics 2025-11-18 Kaicheng Jin , Yang Peng , Jiansheng Yang , Zhihua Zhang

Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear…

Multiagent Systems · Computer Science 2024-07-31 Nicolò Dal Fabbro , Arman Adibi , Aritra Mitra , George J. Pappas

In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information…

Machine Learning · Computer Science 2021-04-20 Milos S. Stankovic , Marko Beko , Srdjan S. Stankovic

One of the most basic problems in reinforcement learning (RL) is policy evaluation: estimating the long-term return, i.e., value function, corresponding to a given fixed policy. The celebrated Temporal Difference (TD) learning algorithm…

Machine Learning · Computer Science 2025-02-10 Sreejeet Maity , Aritra Mitra

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in…

Machine Learning · Computer Science 2020-10-29 Rui Liu , Alex Olshevsky

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable,…

We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this…

Machine Learning · Computer Science 2024-06-04 Haoxing Tian , Ioannis Ch. Paschalidis , Alex Olshevsky

Temporal difference (TD) learning is a fundamental algorithm for estimating value functions in reinforcement learning. Recent finite-time analyses of TD with linear function approximation quantify its theoretical convergence rate. However,…

Machine Learning · Computer Science 2026-03-04 Yunxiang Li , Mark Schmidt , Reza Babanezhad , Sharan Vaswani

This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$) is very sensitive to the choice of stepsizes. Oftentimes,…

Optimization and Control · Mathematics 2021-10-12 Tao Sun , Han Shen , Tianyi Chen , Dongsheng Li

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central…

Machine Learning · Computer Science 2024-10-22 Sajad Khodadadian , Pranay Sharma , Gauri Joshi , Siva Theja Maguluri

This paper considers a novel multi-agent linear stochastic approximation algorithm driven by Markovian noise and general consensus-type interaction, in which each agent evolves according to its local stochastic approximation process which…

Machine Learning · Computer Science 2023-10-02 Yixuan Lin , Vijay Gupta , Ji Liu

TD($\lambda$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic…

Machine Learning · Computer Science 2026-05-13 Yue Deng , Zirui Wang , Yin Zhang
‹ Prev 1 2 3 10 Next ›