Related papers: Decentralized Policy Optimization

Model-Based Decentralized Policy Optimization

Decentralized policy optimization has been commonly used in cooperative multi-agent tasks. However, since all agents are updating their policies simultaneously, from the perspective of individual agents, the environment is non-stationary,…

Machine Learning · Computer Science 2023-02-17 Hao Luo , Jiechuan Jiang , Zongqing Lu

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, Liquid-Graph-Time Clustering-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by…

Machine Learning · Statistics 2026-02-12 Antonio Marino , Esteban Restrepo , Claudio Pacchierotti , Paolo Robuffo Giordano

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

Multi-Agent Guided Policy Optimization

Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL).…

Artificial Intelligence · Computer Science 2026-03-16 Yueheng Li , Guangming Xie , Zongqing Lu

Fully-Decentralized MADDPG with Networked Agents

In this paper, we devise three actor-critic algorithms with decentralized training for multi-agent reinforcement learning in cooperative, adversarial, and mixed settings with continuous action spaces. To this goal, we adapt the MADDPG…

Machine Learning · Computer Science 2025-03-11 Diego Bolliger , Lorenz Zauter , Robert Ziegler

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Coordinated Proximal Policy Optimization

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the…

Artificial Intelligence · Computer Science 2021-11-09 Zifan Wu , Chao Yu , Deheng Ye , Junge Zhang , Haiyin Piao , Hankz Hankui Zhuo

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function. In this paper, we…

Artificial Intelligence · Computer Science 2020-11-20 Christian Schroeder de Witt , Tarun Gupta , Denys Makoviichuk , Viktor Makoviychuk , Philip H. S. Torr , Mingfei Sun , Shimon Whiteson

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance…

Machine Learning · Computer Science 2023-05-09 Yulai Zhao , Zhuoran Yang , Zhaoran Wang , Jason D. Lee

Multi-Agent Trust Region Policy Optimization

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By…

Artificial Intelligence · Computer Science 2023-08-08 Hepeng Li , Haibo He

Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions. However, existing solutions only learn to extract a…

Machine Learning · Computer Science 2022-10-03 Minghuan Liu , Zhengbang Zhu , Yuzheng Zhuang , Weinan Zhang , Jianye Hao , Yong Yu , Jun Wang

A Multi-Objective Optimization framework for Decentralized Learning with coordination constraints

This article introduces a generalized framework for Decentralized Learning formulated as a Multi-Objective Optimization problem, in which both distributed agents and a central coordinator contribute independent, potentially conflicting…

Optimization and Control · Mathematics 2025-07-21 Roberto Morales , Umberto Biccari

Modified Actor-Critics

Recent successful deep reinforcement learning algorithms, such as Trust Region Policy Optimization (TRPO) or Proximal Policy Optimization (PPO), are fundamentally variations of conservative policy iteration (CPI). These algorithms iterate…

Machine Learning · Computer Science 2020-01-27 Erinc Merdivan , Sten Hanke , Matthieu Geist

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each…

Multiagent Systems · Computer Science 2021-11-02 Kuo Li , Qing-Shan Jia

Decentralized Learning for Optimality in Stochastic Dynamic Teams and Games with Local Control and Global State Information

Stochastic dynamic teams and games are rich models for decentralized systems and challenging testing grounds for multi-agent learning. Previous work that guaranteed team optimality assumed stateless dynamics, or an explicit coordination…

Optimization and Control · Mathematics 2024-03-28 Bora Yongacoglu , Gürdal Arslan , Serdar Yüksel

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is…

Machine Learning · Computer Science 2022-11-07 Chao Yu , Akash Velu , Eugene Vinitsky , Jiaxuan Gao , Yu Wang , Alexandre Bayen , Yi Wu

Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems

Resource allocation in High Performance Computing (HPC) environments presents a complex and multifaceted challenge for job scheduling algorithms. Beyond the efficient allocation of system resources, schedulers must account for and optimize…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-08 Matthew Sgambati , Aleksandar Vakanski , Matthew Anderson

Towards an Understanding of Default Policies in Multitask Policy Optimization

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize…

Machine Learning · Computer Science 2022-03-24 Ted Moskovitz , Michael Arbel , Jack Parker-Holder , Aldo Pacchiano

Adversarial Policy Optimization in Deep Reinforcement Learning

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue