Related papers: Coordinated Proximal Policy Optimization

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is…

Machine Learning · Computer Science 2022-11-07 Chao Yu , Akash Velu , Eugene Vinitsky , Jiaxuan Gao , Yu Wang , Alexandre Bayen , Yi Wu

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Policy optimization methods with function approximation are widely used in multi-agent reinforcement learning. However, it remains elusive how to design such algorithms with statistical guarantees. Leveraging a multi-agent performance…

Machine Learning · Computer Science 2023-05-09 Yulai Zhao , Zhuoran Yang , Zhaoran Wang , Jason D. Lee

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each…

Machine Learning · Computer Science 2020-12-07 Wangshu Zhu , Andre Rosendo

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Style-Preserving Policy Optimization for Game Agents

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency,…

Artificial Intelligence · Computer Science 2025-09-23 Lingfeng Li , Yunlong Lu , Yongyi Wang , Wenxin Li

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

ERPPO: Entropy Regularization-based Proximal Policy Optimization

Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by…

Machine Learning · Computer Science 2026-05-14 Changha Lee , Gyusang Cho

Decentralized Policy Optimization

The study of decentralized learning or independent learning in cooperative multi-agent reinforcement learning has a history of decades. Recently empirical studies show that independent PPO (IPPO) can obtain good performance, close to or…

Machine Learning · Computer Science 2022-11-08 Kefan Su , Zongqing Lu

Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value function. However, previous literature…

Multiagent Systems · Computer Science 2023-06-09 Jian Hu , Siyue Hu , Shih-wei Liao

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Enhancing PPO with Trajectory-Aware Hybrid Policies

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable…

Machine Learning · Computer Science 2025-02-25 Qisai Liu , Zhanhong Jiang , Hsin-Jung Yang , Mahsa Khosravi , Joshua R. Waite , Soumik Sarkar

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient algorithm that has been successfully applied to complex computer games such as Dota 2 and Honor of Kings. In these environments, an agent makes compound actions…

Machine Learning · Computer Science 2023-01-27 Xiulei Song , Yizhao Jin , Greg Slabaugh , Simon Lucas

Competitive Policy Optimization

A core challenge in policy optimization in competitive Markov decision processes is the design of efficient optimization methods with desirable convergence and stability properties. To tackle this, we propose competitive policy optimization…

Machine Learning · Computer Science 2020-06-19 Manish Prajapat , Kamyar Azizzadenesheli , Alexander Liniger , Yisong Yue , Anima Anandkumar

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress…

Machine Learning · Computer Science 2020-11-04 Perttu Hämäläinen , Amin Babadi , Xiaoxiao Ma , Jaakko Lehtinen

Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a novel strategy where we combine multiple…

Machine Learning · Computer Science 2026-03-24 Wenwen Si , Sooyong Jang , Insup Lee , Osbert Bastani

MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games

Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These…

Multiagent Systems · Computer Science 2025-12-23 Zhaoqilin Yang , Axin Xiang , Kedi Yang , Tianjun Liu , Youliang Tian

Order Matters: Agent-by-agent Policy Optimization

While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a…

Artificial Intelligence · Computer Science 2023-02-28 Xihuai Wang , Zheng Tian , Ziyu Wan , Ying Wen , Jun Wang , Weinan Zhang

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability.…

Machine Learning · Computer Science 2020-11-12 Junwei Zhang , Zhenghao Zhang , Shuai Han , Shuai Lü