Related papers: Model-Based Decentralized Policy Optimization

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperative multi-agent reinforcement learning has a history of decades. Recently empirical studies show that independent PPO (IPPO) can obtain good performance, close to or…

Machine Learning · Computer Science 2022-11-08 Kefan Su , Zongqing Lu

Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration

Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization…

Artificial Intelligence · Computer Science 2026-05-11 Yuval Aroosh , Ayal Taitler

Multi-Agent Guided Policy Optimization

Due to practical constraints such as partial observability and limited communication, Centralized Training with Decentralized Execution (CTDE) has become the dominant paradigm in cooperative Multi-Agent Reinforcement Learning (MARL).…

Artificial Intelligence · Computer Science 2026-03-16 Yueheng Li , Guangming Xie , Zongqing Lu

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Coordinated Proximal Policy Optimization

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the…

Artificial Intelligence · Computer Science 2021-11-09 Zifan Wu , Chao Yu , Deheng Ye , Junge Zhang , Haiyin Piao , Hankz Hankui Zhuo

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy…

Machine Learning · Computer Science 2024-05-01 Lakshmi Mandal , Chandrashekar Lakshminarayanan , Shalabh Bhatnagar

Latent-Variable Advantage-Weighted Policy Optimization for Offline RL

Offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. This setting is particularly well-suited for continuous control robotic…

Machine Learning · Computer Science 2022-03-18 Xi Chen , Ali Ghadirzadeh , Tianhe Yu , Yuan Gao , Jianhao Wang , Wenzhe Li , Bin Liang , Chelsea Finn , Chongjie Zhang

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Multi-Agent Trust Region Policy Optimization

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By…

Artificial Intelligence · Computer Science 2023-08-08 Hepeng Li , Haibo He

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance…

Machine Learning · Computer Science 2023-05-09 Yulai Zhao , Zhuoran Yang , Zhaoran Wang , Jason D. Lee

Bidirectional Model-based Policy Optimization

Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making, which, however, may fail catastrophically if the model is inaccurate. Although there are several existing methods…

Machine Learning · Computer Science 2020-09-30 Hang Lai , Jian Shen , Weinan Zhang , Yong Yu

Order Matters: Agent-by-agent Policy Optimization

While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a…

Artificial Intelligence · Computer Science 2023-02-28 Xihuai Wang , Zheng Tian , Ziyu Wan , Ying Wen , Jun Wang , Weinan Zhang

Improving DAPO from a Mixed-Policy Perspective

This paper introduces two novel modifications to the Dynamic sAmpling Policy Optimization (DAPO) algorithm [1], approached from a mixed-policy perspective. Standard policy gradient methods can suffer from instability and sample…

Machine Learning · Computer Science 2025-08-20 Hongze Tan , Yuchen Li

Policy Iteration for Decentralized Control of Markov Decision Processes

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov…

Artificial Intelligence · Computer Science 2014-01-16 Daniel S. Bernstein , Christopher Amato , Eric A. Hansen , Shlomo Zilberstein

Style-Preserving Policy Optimization for Game Agents

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency,…

Artificial Intelligence · Computer Science 2025-09-23 Lingfeng Li , Yunlong Lu , Yongyi Wang , Wenxin Li

Trajectory Optimization for Nonlinear Multi-Agent Systems using Decentralized Learning Model Predictive Control

We present a decentralized minimum-time trajectory optimization scheme based on learning model predictive control for multi-agent systems with nonlinear decoupled dynamics and coupled state constraints. By performing the same task…

Systems and Control · Electrical Eng. & Systems 2020-12-21 Edward L. Zhu , Yvonne R. Stürz , Ugo Rosolia , Francesco Borrelli

Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that…

Systems and Control · Electrical Eng. & Systems 2020-10-26 Yingying Li , Yujie Tang , Runyu Zhang , Na Li

Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

This work develops a fully decentralized multi-agent algorithm for policy evaluation. The proposed scheme can be applied to two distinct scenarios. In the first scenario, a collection of agents have distinct datasets gathered following…

Machine Learning · Computer Science 2019-08-13 Lucas Cassano , Kun Yuan , Ali H. Sayed

Towards an Understanding of Default Policies in Multitask Policy Optimization

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize…

Machine Learning · Computer Science 2022-03-24 Ted Moskovitz , Michael Arbel , Jack Parker-Holder , Aldo Pacchiano