Related papers: Proximal Policy Optimization Smoothed Algorithm

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

A dynamical clipping approach with task feedback for Proximal Policy Optimization

Proximal Policy Optimization (PPO) has been broadly applied to robotics learning, showcasing stable training performance. However, the fixed clipping bound setting may limit the performance of PPO. Specifically, there is no theoretical…

Machine Learning · Computer Science 2024-11-07 Ziqi Zhang , Jingzehua Xu , Zifeng Zhuang , Hongyin Zhang , Jinxin Liu , Donglin wang , Shuai Zhang

Simple Policy Optimization

Model-free reinforcement learning algorithms have seen remarkable progress, but key challenges remain. Trust Region Policy Optimization (TRPO) is known for ensuring monotonic policy improvement through conservative updates within a trust…

Machine Learning · Computer Science 2025-07-29 Zhengpeng Xie , Qiang Zhang , Fan Yang , Marco Hutter , Renjing Xu

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences.…

Machine Learning · Computer Science 2023-11-13 Jared Markowitz , Edward W. Staley

Trust Region-Guided Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure

The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using a novel surrogate objective that employs the sigmoid function…

Machine Learning · Computer Science 2022-12-06 Xing Chen , Dongcui Diao , Hechang Chen , Hengshuai Yao , Haiyin Piao , Zhixiao Sun , Zhiwei Yang , Randy Goebel , Bei Jiang , Yi Chang

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

PPO in the Fisher-Rao geometry

Proximal Policy Optimization (PPO) is widely used in reinforcement learning due to its strong empirical performance, yet it lacks formal guarantees for policy improvement and convergence. PPO's clipped surrogate objective is motivated by a…

Machine Learning · Computer Science 2026-02-02 Razvan-Andrei Lascu , David Šiška , Łukasz Szpruch

You May Not Need Ratio Clipping in PPO

Proximal Policy Optimization (PPO) methods learn a policy by iteratively performing multiple mini-batch optimization epochs of a surrogate objective with one set of sampled data. Ratio clipping PPO is a popular variant that clips the…

Machine Learning · Computer Science 2022-02-02 Mingfei Sun , Vitaly Kurin , Guoqing Liu , Sam Devlin , Tao Qin , Katja Hofmann , Shimon Whiteson

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint…

Machine Learning · Computer Science 2022-06-20 Linrui Zhang , Li Shen , Long Yang , Shixiang Chen , Bo Yuan , Xueqian Wang , Dacheng Tao

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability.…

Machine Learning · Computer Science 2020-11-12 Junwei Zhang , Zhenghao Zhang , Shuai Han , Shuai Lü

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress…

Machine Learning · Computer Science 2020-11-04 Perttu Hämäläinen , Amin Babadi , Xiaoxiao Ma , Jaakko Lehtinen

Enhancing PPO with Trajectory-Aware Hybrid Policies

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable…

Machine Learning · Computer Science 2025-02-25 Qisai Liu , Zhanhong Jiang , Hsin-Jung Yang , Mahsa Khosravi , Joshua R. Waite , Soumik Sarkar

Coordinated Proximal Policy Optimization

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting. The key idea lies in the coordinated adaptation of step size during the…

Artificial Intelligence · Computer Science 2021-11-09 Zifan Wu , Chao Yu , Deheng Ye , Junge Zhang , Haiyin Piao , Hankz Hankui Zhuo

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions…

Machine Learning · Computer Science 2018-12-18 Cheng Zeng , Hongming Zhang

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient algorithm that has been successfully applied to complex computer games such as Dota 2 and Honor of Kings. In these environments, an agent makes compound actions…

Machine Learning · Computer Science 2023-01-27 Xiulei Song , Yizhao Jin , Greg Slabaugh , Simon Lucas