Related papers: A Multi-Agent Off-Policy Actor-Critic Algorithm fo…

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the…

Machine Learning · Computer Science 2024-07-09 Ainur Zhaikhan , Ali H. Sayed

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on…

Machine Learning · Computer Science 2015-03-20 Thomas Degris , Martha White , Richard S. Sutton

Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

This work develops a fully decentralized multi-agent algorithm for policy evaluation. The proposed scheme can be applied to two distinct scenarios. In the first scenario, a collection of agents have distinct datasets gathered following…

Machine Learning · Computer Science 2019-08-13 Lucas Cassano , Kun Yuan , Ali H. Sayed

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each…

Multiagent Systems · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data…

Machine Learning · Computer Science 2022-06-16 Raghuram Bharadwaj Diddigi , Prateek Jain , Prabuchandran K. J. , Shalabh Bhatnagar

Distributed Policy Evaluation Under Multiple Behavior Strategies

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The…

Multiagent Systems · Computer Science 2014-11-06 Sergio Valcarcel Macua , Jianshu Chen , Santiago Zazo , Ali H. Sayed

Actor-Attention-Critic for Multi-Agent Reinforcement Learning

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in…

Machine Learning · Computer Science 2019-05-29 Shariq Iqbal , Fei Sha

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

In this paper we propose several novel distributed gradient-based temporal difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes with strict information…

Machine Learning · Computer Science 2021-04-20 Milos S. Stankovic , Marko Beko , Srdjan S. Stankovic

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper considers a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors. A randomized…

Machine Learning · Computer Science 2019-07-09 Yixuan Lin , Kaiqing Zhang , Zhuoran Yang , Zhaoran Wang , Tamer Başar , Romeil Sandhu , Ji Liu

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can improve data efficiency by repeatedly using the previously gathered data. However, off-policy learning becomes challenging when the discrepancy…

Machine Learning · Computer Science 2023-09-27 Baturay Saglam , Dogan C. Cicek , Furkan B. Mutlu , Suleyman S. Kozat

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with both state-value and policy function-approximation, and are guaranteed to converge under off-policy training. Our solution targets problems in reinforcement learning…

Artificial Intelligence · Computer Science 2018-02-23 Hamid Reza Maei

Variance Penalized On-Policy and Off-Policy Actor-Critic

Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this…

Machine Learning · Computer Science 2021-02-04 Arushi Jain , Gandharv Patil , Ayush Jain , Khimya Khetarpal , Doina Precup

Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

Most previous studies on multi-agent reinforcement learning focus on deriving decentralized and cooperative policies to maximize a common reward and rarely consider the transferability of trained policies to new tasks. This prevents such…

Machine Learning · Computer Science 2019-11-28 Heechang Ryu , Hayong Shin , Jinkyoo Park

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving,…

Machine Learning · Computer Science 2018-11-08 Sina Ghiassian , Andrew Patterson , Martha White , Richard S. Sutton , Adam White

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of…

Machine Learning · Computer Science 2019-11-14 Raghuram Bharadwaj Diddigi , Chandramouli Kamanchi , Shalabh Bhatnagar

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

We study the policy evaluation problem in multi-agent reinforcement learning. In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of…

Optimization and Control · Mathematics 2019-06-04 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively…

Machine Learning · Computer Science 2021-06-15 Dong-Ki Kim , Miao Liu , Matthew Riemer , Chuangchuang Sun , Marwa Abdulhai , Golnaz Habibi , Sebastian Lopez-Cot , Gerald Tesauro , Jonathan P. How

A Communication-Efficient Decentralized Actor-Critic Algorithm

In this paper, we study the problem of reinforcement learning in multi-agent systems where communication among agents is limited. We develop a decentralized actor-critic learning framework in which each agent performs several local updates…

Machine Learning · Computer Science 2025-10-23 Xiaoxing Ren , Nicola Bastianello , Thomas Parisini , Andreas A. Malikopoulos

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Raihan Seraj , Samin Yeasar Arnob , Doina Precup