English
Related papers

Related papers: Proximal Policy Optimization for Tracking Control …

200 papers

Reinforcement learning (RL) is already widely applied to applications such as robotics, but it is only sparsely used in sensor management. In this paper, we apply the popular Proximal Policy Optimization (PPO) approach to a multi-agent UAV…

Robotics · Computer Science 2022-10-21 André Brandenburger , Folker Hoffmann , Alexander Charlish

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in…

Machine Learning · Computer Science 2023-10-19 Hanyang Zhao , Wenpin Tang , David D. Yao

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

This article proposes a proximal policy optimization (PPO)-based reinforcement learning (RL) approach for DC-DC boost converter control that is compared with traditional control methods. The performance of the PPO algorithm is evaluated…

Systems and Control · Electrical Eng. & Systems 2025-01-03 Utsab Saha , Atik Jawad , Shakib Shahria , A. B. M Harun-Ur Rashid

Residential demand response programs aim to activate demand flexibility at the household level. In recent years, reinforcement learning (RL) has gained significant attention for these type of applications. A major challenge of RL algorithms…

Systems and Control · Electrical Eng. & Systems 2024-03-13 Thijs Peirelinck , Chris Hermans , Fred Spiessens , Geert Deconinck

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.…

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a…

Machine Learning · Computer Science 2020-06-22 Ahmed Touati , Amy Zhang , Joelle Pineau , Pascal Vincent

Proximal Policy Optimization (PPO) is a highly popular model-free reinforcement learning (RL) approach. However, we observe that in a continuous action space, PPO can prematurely shrink the exploration variance, which leads to slow progress…

Machine Learning · Computer Science 2020-11-04 Perttu Hämäläinen , Amin Babadi , Xiaoxiao Ma , Jaakko Lehtinen

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Group Relative Policy Optimization (GRPO) has shown promise in discrete action spaces by eliminating value function dependencies through group-based advantage estimation. However, its application to continuous control remains unexplored,…

Robotics · Computer Science 2025-07-29 Rajat Khanda , Mohammad Baqar , Sambuddha Chakrabarti , Satyasaran Changdar

This paper investigates the application of Reinforcement Learning (RL) to optimise call routing in call centres to minimise client waiting time and staff idle time. Two methods are compared: a model-based approach using Value Iteration (VI)…

Artificial Intelligence · Computer Science 2025-07-25 Kwong Ho Li , Wathsala Karunarathne

Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty. While additional information, such as that available in simulations, can enhance training,…

Machine Learning · Computer Science 2026-03-16 Yueheng Li , Guangming Xie , Zongqing Lu

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn…

Machine Learning · Computer Science 2026-01-27 Junbo Li , Peng Zhou , Rui Meng , Meet P. Vadera , Lihong Li , Yang Li

This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of…

Machine Learning · Computer Science 2020-08-04 Marco Maggipinto , Gian Antonio Susto , Pratik Chaudhari

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial…

Computation and Language · Computer Science 2018-08-27 Yi-Lin Tuan , Jinzhi Zhang , Yujia Li , Hung-yi Lee

The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of policy for stable and efficient learning. A popular method, named proximal policy optimization (PPO), has been introduced for this purpose. PPO…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi
‹ Prev 1 2 3 10 Next ›