Related papers: Proximal Policy Optimization for Tracking Control …

Co-Training an Observer and an Evading Target

Reinforcement learning (RL) is already widely applied to applications such as robotics, but it is only sparsely used in sensor management. In this paper, we apply the popular Proximal Policy Optimization (PPO) approach to a multi-agent UAV…

Robotics · Computer Science 2022-10-21 André Brandenburger , Folker Hoffmann , Alexander Charlish

Policy Optimization for Continuous Reinforcement Learning

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in…

Machine Learning · Computer Science 2023-10-19 Hanyang Zhao , Wenpin Tang , David D. Yao

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Proximal Policy Optimization-Based Reinforcement Learning Approach for DC-DC Boost Converter Control: A Comparative Evaluation Against Traditional Control Techniques

This article proposes a proximal policy optimization (PPO)-based reinforcement learning (RL) approach for DC-DC boost converter control that is compared with traditional control methods. The performance of the PPO algorithm is evaluated…

Systems and Control · Electrical Eng. & Systems 2025-01-03 Utsab Saha , Atik Jawad , Shakib Shahria , A. B. M Harun-Ur Rashid

Combined Peak Reduction and Self-Consumption Using Proximal Policy Optimization

Residential demand response programs aim to activate demand flexibility at the household level. In recent years, reinforcement learning (RL) has gained significant attention for these type of applications. A major challenge of RL algorithms…

Systems and Control · Electrical Eng. & Systems 2024-03-13 Thijs Peirelinck , Chris Hermans , Fred Spiessens , Geert Deconinck

Trust Region-Guided Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

Reflective Policy Optimization

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

REBEL: Reinforcement Learning via Regressing Relative Rewards

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.…

Machine Learning · Computer Science 2024-12-11 Zhaolin Gao , Jonathan D. Chang , Wenhao Zhan , Owen Oertell , Gokul Swamy , Kianté Brantley , Thorsten Joachims , J. Andrew Bagnell , Jason D. Lee , Wen Sun

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Stable Policy Optimization via Off-Policy Divergence Regularization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress…

Machine Learning · Computer Science 2020-11-04 Perttu Hämäläinen , Amin Babadi , Xiaoxiao Ma , Jaakko Lehtinen

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Extending Group Relative Policy Optimization to Continuous Control: A Theoretical Framework for Robotic Reinforcement Learning

Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored,…

Robotics · Computer Science 2025-07-29 Rajat Khanda , Mohammad Baqar , Sambuddha Chakrabarti , Satyasaran Changdar

Optimising Call Centre Operations using Reinforcement Learning: Value Iteration versus Proximal Policy Optimisation

This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI)…

Artificial Intelligence · Computer Science 2025-07-25 Kwong Ho Li , Wathsala Karunarathne

Guided Policy Optimization under Partial Observability

Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty. While additional information, such as that available in simulations, can enhance training,…

Machine Learning · Computer Science 2026-03-16 Yueheng Li , Guangming Xie , Zongqing Lu

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn…

Machine Learning · Computer Science 2026-01-27 Junbo Li , Peng Zhou , Rui Meng , Meet P. Vadera , Lihong Li , Yang Li

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of…

Machine Learning · Computer Science 2020-08-04 Marco Maggipinto , Gian Antonio Susto , Pratik Chaudhari

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial…

Computation and Language · Computer Science 2018-08-27 Yi-Lin Tuan , Jinzhi Zhang , Yujia Li , Hung-yi Lee

Proximal Policy Optimization with Relative Pearson Divergence

The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of policy for stable and efficient learning. A popular method, named proximal policy optimization (PPO), has been introduced for this purpose. PPO…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi