Related papers: Proximal Policy Optimization with Evolutionary Mut…

Proximal Policy Optimization via Enhanced Exploration Efficiency

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability.…

Machine Learning · Computer Science 2020-11-12 Junwei Zhang , Zhenghao Zhang , Shuai Han , Shuai Lü

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress…

Machine Learning · Computer Science 2020-11-04 Perttu Hämäläinen , Amin Babadi , Xiaoxiao Ma , Jaakko Lehtinen

Beyond the Boundaries of Proximal Policy Optimization

Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the…

Machine Learning · Computer Science 2024-11-04 Charlie B. Tan , Edan Toledo , Benjamin Ellis , Jakob N. Foerster , Ferenc Huszár

Evolutionary Policy Optimization

On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additional parallel environments yield redundant data…

Machine Learning · Computer Science 2025-11-13 Jianren Wang , Yifan Su , Abhinav Gupta , Deepak Pathak

An Approximate Ascent Approach To Prove Convergence of PPO

Proximal Policy Optimization (PPO) is among the most widely used deep reinforcement learning algorithms, yet its theoretical foundations remain incomplete. Most importantly, convergence and understanding of fundamental PPO advantages remain…

Machine Learning · Computer Science 2026-02-04 Leif Doering , Daniel Schmidt , Moritz Melcher , Sebastian Kassing , Benedikt Wille , Tilman Aach , Simon Weissmann

Proximal Policy Optimization with Adaptive Exploration

Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute…

Machine Learning · Computer Science 2024-05-09 Andrei Lixandru

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each…

Machine Learning · Computer Science 2020-12-07 Wangshu Zhu , Andre Rosendo

Evolutionary Policy Optimization

A key challenge in reinforcement learning (RL) is managing the exploration-exploitation trade-off without sacrificing sample efficiency. Policy gradient (PG) methods excel in exploitation through fine-grained, gradient-based optimization…

Machine Learning · Computer Science 2025-04-18 Zelal Su "Lain" Mustafaoglu , Keshav Pingali , Risto Miikkulainen

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training…

Machine Learning · Computer Science 2022-12-14 Qisheng Zhang , Zhen Guo , Audun Jøsang , Lance M. Kaplan , Feng Chen , Dong H. Jeong , Jin-Hee Cho

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm that heavily relies on accurate advantage estimates for stable and efficient training. However, raw advantage signals can exhibit significant variance,…

Machine Learning · Computer Science 2025-05-22 Soham Sane

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial…

Computation and Language · Computer Science 2018-08-27 Yi-Lin Tuan , Jinzhi Zhang , Yujia Li , Hung-yi Lee

Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the…

Machine Learning · Computer Science 2024-10-29 Sheryl Paul , Jyotirmoy V. Deshmukh

A dynamical clipping approach with task feedback for Proximal Policy Optimization

Proximal Policy Optimization (PPO) has been broadly applied to robotics learning, showcasing stable training performance. However, the fixed clipping bound setting may limit the performance of PPO. Specifically, there is no theoretical…

Machine Learning · Computer Science 2024-11-07 Ziqi Zhang , Jingzehua Xu , Zifeng Zhuang , Hongyin Zhang , Jinxin Liu , Donglin wang , Shuai Zhang

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant…

Machine Learning · Computer Science 2023-10-11 Tianhao Wu , Banghua Zhu , Ruoyu Zhang , Zhaojin Wen , Kannan Ramchandran , Jiantao Jiao

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

In this paper we analyze the qualitative differences between evolutionary strategies and reinforcement learning algorithms by focusing on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy…

Artificial Intelligence · Computer Science 2022-05-17 Nicola Milano , Stefano Nolfi