Related papers: Multi-Path Policy Optimization

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability.…

Machine Learning · Computer Science 2020-11-12 Junwei Zhang , Zhenghao Zhang , Shuai Han , Shuai Lü

Trust Region-Guided Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

Style-Preserving Policy Optimization for Game Agents

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency,…

Artificial Intelligence · Computer Science 2025-09-23 Lingfeng Li , Yunlong Lu , Yongyi Wang , Wenxin Li

Reflective Policy Optimization

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning

In cooperative Multi-Agent Reinforcement Learning (MARL), efficient exploration is crucial for optimizing the performance of joint policy. However, existing methods often update joint policies via independent agent exploration, without…

Machine Learning · Computer Science 2025-11-18 Zejiao Liu , Junqi Tu , Yitian Hong , Luolin Xiong , Yaochu Jin , Yang Tang , Fangfei Li

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or…

Machine Learning · Computer Science 2024-01-09 Wentse Chen , Shiyu Huang , Yuan Chiang , Tim Pearce , Wei-Wei Tu , Ting Chen , Jun Zhu

Enhancing PPO with Trajectory-Aware Hybrid Policies

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable…

Machine Learning · Computer Science 2025-02-25 Qisai Liu , Zhanhong Jiang , Hsin-Jung Yang , Mahsa Khosravi , Joshua R. Waite , Soumik Sarkar

Intrinsic Reward Policy Optimization for Sparse-Reward Environments

Exploration is essential in reinforcement learning as an agent relies on trial and error to learn an optimal policy. However, when rewards are sparse, naive exploration strategies, like noise injection, are often insufficient. Intrinsic…

Machine Learning · Computer Science 2026-01-30 Minjae Cho , Huy Trong Tran

ERPPO: Entropy Regularization-based Proximal Policy Optimization

Multi-Agent Proximal Policy Optimization (MAPPO) is a variant of the Proximal Policy Optimization (PPO) algorithm, specifically tailored for multi-agent reinforcement learning (MARL). MAPPO optimizes cooperative multi-agent settings by…

Machine Learning · Computer Science 2026-05-14 Changha Lee , Gyusang Cho

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is…

Machine Learning · Computer Science 2022-11-07 Chao Yu , Akash Velu , Eugene Vinitsky , Jiaxuan Gao , Yu Wang , Alexandre Bayen , Yi Wu

Model-Driven Policy Optimization in Differentiable Simulators via Stochastic Exploration

Differentiable planning enables gradient-based optimization of decision-making problems by leveraging differentiable models of system dynamics. However, in highly nonlinear and hybrid discrete-continuous domains, the resulting optimization…

Artificial Intelligence · Computer Science 2026-05-11 Yuval Aroosh , Ayal Taitler

Match or Replay: Self Imitating Proximal Policy Optimization

Reinforcement Learning (RL) agents often struggle with inefficient exploration, particularly in environments with sparse rewards. Traditional exploration strategies can lead to slow learning and suboptimal performance because agents fail to…

Machine Learning · Computer Science 2026-03-31 Gaurav Chaudhary , Laxmidhar Behera , Washim Uddin Mondal

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

A fundamental issue in reinforcement learning algorithms is the balance between exploration of the environment and exploitation of information already obtained by the agent. Especially, exploration has played a critical role for both…

Machine Learning · Computer Science 2019-05-29 Ruihan Yang , Qiwei Ye , Tie-Yan Liu

Improving DAPO from a Mixed-Policy Perspective

This paper introduces two novel modifications to the Dynamic sAmpling Policy Optimization (DAPO) algorithm [1], approached from a mixed-policy perspective. Standard policy gradient methods can suffer from instability and sample…

Machine Learning · Computer Science 2025-08-20 Hongze Tan , Yuchen Li

M3PO: Massively Multi-Task Model-Based Policy Optimization

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task…

Machine Learning · Computer Science 2025-06-30 Aditya Narendra , Dmitry Makarov , Aleksandr Panov

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing

We present Memory Augmented Policy Optimization (MAPO), a simple and novel way to leverage a memory buffer of promising trajectories to reduce the variance of policy gradient estimate. MAPO is applicable to deterministic environments with…

Machine Learning · Computer Science 2019-01-15 Chen Liang , Mohammad Norouzi , Jonathan Berant , Quoc Le , Ni Lao

Constrained Policy Optimization

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel