Related papers: Distributed Value Function Approximation for Colla…

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy…

Machine Learning · Computer Science 2019-11-20 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Zhaoran Wang , Tamer Basar , Ji Liu

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

We study the policy evaluation problem in multi-agent reinforcement learning, modeled by a Markov decision process. In this problem, the agents operate in a common environment under a fixed control policy, working together to discover the…

Optimization and Control · Mathematics 2020-01-13 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Primal-Dual Distributed Temporal Difference Learning

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for a class of multi-agent Markov decision processes (MDPs). The temporal-difference (TD) learning is a reinforcement…

Optimization and Control · Mathematics 2020-04-29 Donghwan Lee , Jianghai Hu

Real-Time Reinforcement Learning of Constrained Markov Decision Processes with Weak Derivatives

We present on-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process. The stochastic approximation algorithms require estimation of the gradient of the…

Optimization and Control · Mathematics 2018-12-18 Vikram Krishnamurthy , Felisa Vazquez Abad

Primal-Dual Algorithm for Distributed Reinforcement Learning: Distributed GTD

The goal of this paper is to study a distributed version of the gradient temporal-difference (GTD) learning algorithm for multi-agent Markov decision processes (MDPs). The temporal difference (TD) learning is a reinforcement learning (RL)…

Optimization and Control · Mathematics 2018-08-23 Donghwan Lee , Hyungjin Yoon , Naira Hovakimyan

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning

This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning (MARL), where agents over a communication network aim to find the optimal policy to maximize the average of all agents' local returns. Due…

Multiagent Systems · Computer Science 2022-12-06 Xiaoxiao Zhao , Jinlong Lei , Li Li , Jie Chen

Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

This work develops a fully decentralized multi-agent algorithm for policy evaluation. The proposed scheme can be applied to two distinct scenarios. In the first scenario, a collection of agents have distinct datasets gathered following…

Machine Learning · Computer Science 2019-08-13 Lucas Cassano , Kun Yuan , Ali H. Sayed

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several…

Machine Learning · Computer Science 2018-03-30 Huizhen Yu

Fast Multi-Agent Temporal-Difference Learning via Homotopy Stochastic Primal-Dual Optimization

We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local…

Optimization and Control · Mathematics 2021-11-08 Dongsheng Ding , Xiaohan Wei , Zhuoran Yang , Zhaoran Wang , Mihailo R. Jovanović

Temporal Difference Learning as Gradient Splitting

Temporal difference learning with linear function approximation is a popular method to obtain a low-dimensional approximation of the value function of a policy in a Markov Decision Process. We give a new interpretation of this method in…

Machine Learning · Computer Science 2020-10-29 Rui Liu , Alex Olshevsky

A primal-dual perspective for distributed TD-learning

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as…

Machine Learning · Computer Science 2025-05-14 Han-Dong Lim , Donghwan Lee

Value Function Decomposition in Markov Recommendation Process

Recent advances in recommender systems have shown that user-system interaction essentially formulates long-term optimization problems, and online reinforcement learning can be adopted to improve recommendation performance. The general…

Information Retrieval · Computer Science 2025-02-04 Xiaobei Wang , Shuchang Liu , Qingpeng Cai , Xiang Li , Lantao Hu , Han li , Guangming Xie

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true…

Machine Learning · Computer Science 2017-04-21 Bo Liu , Daoming Lyu , Wen Dong , Saad Biaz

Distributed Multi-Agent Optimization with State-Dependent Communication

We study distributed algorithms for solving global optimization problems in which the objective function is the sum of local objective functions of agents and the constraint set is given by the intersection of local constraint sets of…

Optimization and Control · Mathematics 2015-03-14 Ilan Lobel , Asuman Ozdaglar , Diego Feijer

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation.…

Multiagent Systems · Computer Science 2022-01-14 Gangshan Jing , He Bai , Jemin George , Aranya Chakrabortty , Piyush. K. Sharma

Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation

We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server. We propose a provably efficient algorithm based on value iteration…

Machine Learning · Computer Science 2023-06-27 Yifei Min , Jiafan He , Tianhao Wang , Quanquan Gu

Differential Temporal Difference Learning

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to…

Machine Learning · Computer Science 2020-03-02 Adithya M. Devraj , Ioannis Kontoyiannis , Sean P. Meyn

Distributed Policy Evaluation Under Multiple Behavior Strategies

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The…

Multiagent Systems · Computer Science 2014-11-06 Sergio Valcarcel Macua , Jianshu Chen , Santiago Zazo , Ali H. Sayed

Distributed Adaptive Gradient Optimization Algorithm

In this paper, a distributed optimization problem with general differentiable convex objective functions is studied for single-integrator and double-integrator multi-agent systems. Two distributed adaptive optimization algorithm is…

Optimization and Control · Mathematics 2017-03-28 Peng Lin , Wei Ren