Related papers: Proximal Policy Optimization Algorithms

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Trust Region-Guided Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial…

Computation and Language · Computer Science 2018-08-27 Yi-Lin Tuan , Jinzhi Zhang , Yujia Li , Hung-yi Lee

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each…

Machine Learning · Computer Science 2020-12-07 Wangshu Zhu , Andre Rosendo

Beyond the Boundaries of Proximal Policy Optimization

Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the…

Machine Learning · Computer Science 2024-11-04 Charlie B. Tan , Edan Toledo , Benjamin Ellis , Jakob N. Foerster , Ferenc Huszár

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

Learning to Constrain Policy Optimization with Virtual Trust Region

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust…

Machine Learning · Computer Science 2022-09-19 Hung Le , Thommen Karimpanal George , Majid Abdolshah , Dung Nguyen , Kien Do , Sunil Gupta , Svetha Venkatesh

Simple Policy Optimization

Model-free reinforcement learning algorithms have seen remarkable progress, but key challenges remain. Trust Region Policy Optimization (TRPO) is known for ensuring monotonic policy improvement through conservative updates within a trust…

Machine Learning · Computer Science 2025-07-29 Zhengpeng Xie , Qiang Zhang , Fan Yang , Marco Hutter , Renjing Xu

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

Gradient Informed Proximal Policy Optimization

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we…

Machine Learning · Computer Science 2023-12-15 Sanghyun Son , Laura Yu Zheng , Ryan Sullivan , Yi-Ling Qiao , Ming C. Lin

An Approximate Ascent Approach To Prove Convergence of PPO

Proximal Policy Optimization (PPO) is among the most widely used deep reinforcement learning algorithms, yet its theoretical foundations remain incomplete. Most importantly, convergence and understanding of fundamental PPO advantages remain…

Machine Learning · Computer Science 2026-02-04 Leif Doering , Daniel Schmidt , Moritz Melcher , Sebastian Kassing , Benedikt Wille , Tilman Aach , Simon Weissmann

Reflective Policy Optimization

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization

As the most successful variant and improvement for Trust Region Policy Optimization (TRPO), proximal policy optimization (PPO) has been widely applied across various domains with several advantages: efficient data utilization, easy…

Machine Learning · Computer Science 2019-02-15 Xiangxiang Chu

Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences.…

Machine Learning · Computer Science 2023-11-13 Jared Markowitz , Edward W. Staley

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient), which is close to both VPG (vanilla policy gradient) and PPO (proximal policy optimization). The PPG objective is a partial variation of the VPG objective and the…

Machine Learning · Computer Science 2020-10-21 Ju-Seung Byun , Byungmoon Kim , Huamin Wang

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions…

Machine Learning · Computer Science 2018-12-18 Cheng Zeng , Hongming Zhang

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO)…

Machine Learning · Computer Science 2021-12-09 Xingxing Liang , Yang Ma , Yanghe Feng , Zhong Liu