Related papers: Asynchronous Coagent Networks

Coagent Networks Revisited

Coagent networks formalize the concept of arbitrary networks of stochastic agents that collaborate to take actions in a reinforcement learning environment. Prominent examples of coagent networks in action include approaches to hierarchical…

Machine Learning · Computer Science 2023-08-31 Modjtaba Shokrian Zini , Mohammad Pedramfar , Matthew Riemer , Ahmadreza Moradipari , Miao Liu

Edge-Compatible Reinforcement Learning for Recommendations

Most reinforcement learning (RL) recommendation systems designed for edge computing must either synchronize during recommendation selection or depend on an unprincipled patchwork collection of algorithms. In this work, we build on…

Machine Learning · Computer Science 2022-08-11 James E. Kostas , Philip S. Thomas , Georgios Theocharous

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…

Machine Learning · Computer Science 2023-10-24 Adrien Bolland , Gilles Louppe , Damien Ernst

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

Multi-agent interactions are increasingly important in the context of reinforcement learning, and the theoretical foundations of policy gradient methods have attracted surging research interest. We investigate the global convergence of…

Optimization and Control · Mathematics 2023-03-21 Sarath Pattathil , Kaiqing Zhang , Asuman Ozdaglar

Natural Policy Gradients In Reinforcement Learning Explained

Traditional policy gradient methods are fundamentally flawed. Natural gradients converge quicker and better, forming the foundation of contemporary Reinforcement Learning such as Trust Region Policy Optimization (TRPO) and Proximal Policy…

Machine Learning · Computer Science 2022-09-07 W. J. A. van Heeswijk

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently…

Machine Learning · Computer Science 2019-05-15 Andreas Doerr , Michael Volpp , Marc Toussaint , Sebastian Trimpe , Christian Daniel

SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment. From the system designer's perspective, it is desirable if the agents can learn…

Artificial Intelligence · Computer Science 2018-03-09 Chengwei Zhang , Xiaohong Li , Jianye Hao , Siqi Chen , Karl Tuyls , Wanli Xue

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases. In prior methods, one must choose…

Machine Learning · Computer Science 2020-09-10 Karl Cobbe , Jacob Hilton , Oleg Klimov , John Schulman

Coactive Learning for Locally Optimal Problem Solving

Coactive learning is an online problem solving setting where the solutions provided by a solver are interactively improved by a domain expert, which in turn drives learning. In this paper we extend the study of coactive learning to problems…

Machine Learning · Computer Science 2014-04-23 Robby Goetschalckx , Alan Fern , Prasad Tadepalli

Faster Policy Learning with Continuous-Time Gradients

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The…

Machine Learning · Computer Science 2021-06-25 Samuel Ainsworth , Kendall Lowrey , John Thickstun , Zaid Harchaoui , Siddhartha Srinivasa

Gradient Informed Proximal Policy Optimization

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we…

Machine Learning · Computer Science 2023-12-15 Sanghyun Son , Laura Yu Zheng , Ryan Sullivan , Yi-Ling Qiao , Ming C. Lin

Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional Reasoning Approach

Cooperative multi-agent problems often require coordination between agents, which can be achieved through a centralized policy that considers the global state. Multi-agent policy gradient (MAPG) methods are commonly used to learn such…

Robotics · Computer Science 2023-08-03 Xubo Lyu , Amin Banitalebi-Dehkordi , Mo Chen , Yong Zhang

Reinforcement Learning in Linear Quadratic Deep Structured Teams: Global Convergence of Policy Gradient Methods

In this paper, we study the global convergence of model-based and model-free policy gradient descent and natural policy gradient descent algorithms for linear quadratic deep structured teams. In such systems, agents are partitioned into a…

Multiagent Systems · Computer Science 2020-12-16 Vida Fathi , Jalal Arabneydi , Amir G. Aghdam

Towards Heterogeneous Multi-Agent Reinforcement Learning with Graph Neural Networks

This work proposes a neural network architecture that learns policies for multiple agent classes in a heterogeneous multi-agent reinforcement setting. The proposed network uses directed labeled graph representations for states, encodes…

Artificial Intelligence · Computer Science 2020-10-22 Douglas De Rizzo Meneghetti , Reinaldo Augusto da Costa Bianchi

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their…

Optimization and Control · Mathematics 2022-09-07 Jinchi Chen , Jie Feng , Weiguo Gao , Ke Wei

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively…

Machine Learning · Computer Science 2021-06-15 Dong-Ki Kim , Miao Liu , Matthew Riemer , Chuangchuang Sun , Marwa Abdulhai , Golnaz Habibi , Sebastian Lopez-Cot , Gerald Tesauro , Jonathan P. How

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy…

Machine Learning · Computer Science 2020-09-23 Nhan H. Pham , Lam M. Nguyen , Dzung T. Phan , Phuong Ha Nguyen , Marten van Dijk , Quoc Tran-Dinh

Cross-Gradient Aggregation for Decentralized Learning from Non-IID data

Decentralized learning enables a group of collaborative agents to learn models using a distributed dataset without the need for a central parameter server. Recently, decentralized learning algorithms have demonstrated state-of-the-art…

Machine Learning · Computer Science 2021-06-30 Yasaman Esfandiari , Sin Yong Tan , Zhanhong Jiang , Aditya Balu , Ethan Herron , Chinmay Hegde , Soumik Sarkar

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy…

Machine Learning · Computer Science 2019-11-20 Wesley Suttle , Zhuoran Yang , Kaiqing Zhang , Zhaoran Wang , Tamer Basar , Ji Liu

The Reinforce Policy Gradient Algorithm Revisited

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with…

Machine Learning · Computer Science 2023-10-10 Shalabh Bhatnagar