English
Related papers

Related papers: Absolute Policy Optimization

200 papers

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy…

Machine Learning · Computer Science 2017-04-24 John Schulman , Sergey Levine , Philipp Moritz , Michael I. Jordan , Pieter Abbeel

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue

Model-free reinforcement learning algorithms have seen remarkable progress, but key challenges remain. Trust Region Policy Optimization (TRPO) is known for ensuring monotonic policy improvement through conservative updates within a trust…

Machine Learning · Computer Science 2025-07-29 Zhengpeng Xie , Qiang Zhang , Fan Yang , Marco Hutter , Renjing Xu

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), as the widely employed policy based reinforcement learning (RL) methods, are prone to converge to a sub-optimal solution as they limit the policy representation…

Machine Learning · Computer Science 2020-06-16 Jun Song , Chaoyue Zhao

Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO…

Machine Learning · Computer Science 2019-11-11 Yuhui Wang , Hao He , Xiaoyang Tan , Yaozhong Gan

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Proximal Policy Optimization (PPO) dominates reinforcement learning and LLM alignment but relies on a "hard clipping" mechanism that discards valuable gradients. Conversely, unconstrained methods like SPO expose the optimization to…

Artificial Intelligence · Computer Science 2026-05-07 Yiheng Zhang , Yiming Wang , Kaiyan Zhao , Zhenglin Wan , Jiayu Chen , Leong Hou U

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust…

Machine Learning · Computer Science 2022-09-19 Hung Le , Thommen Karimpanal George , Majid Abdolshah , Dung Nguyen , Kien Do , Sunil Gupta , Svetha Venkatesh

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable…

Machine Learning · Computer Science 2025-02-25 Qisai Liu , Zhanhong Jiang , Hsin-Jung Yang , Mahsa Khosravi , Joshua R. Waite , Soumik Sarkar

Deep reinforcement learning agents frequently suffer from premature convergence, where early entropy collapse causes the policy to discard exploratory behaviors before discovering globally optimal strategies. We introduce Optimistic Policy…

Machine Learning · Computer Science 2026-03-10 Mai Pham , Vikrant Vaze , Peter Chin

Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where…

Machine Learning · Computer Science 2019-06-27 Takahisa Imagawa , Takuya Hiraoka , Yoshimasa Tsuruoka

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO)…

Machine Learning · Computer Science 2021-12-09 Xingxing Liang , Yang Ma , Yanghe Feng , Zhong Liu

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying…

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint…

Machine Learning · Computer Science 2022-06-20 Linrui Zhang , Li Shen , Long Yang , Shixiang Chen , Bo Yuan , Xueqian Wang , Dacheng Tao

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high…

Machine Learning · Computer Science 2018-11-20 Feiyang Pan , Qingpeng Cai , An-Xiang Zeng , Chun-Xiang Pan , Qing Da , Hualin He , Qing He , Pingzhong Tang

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is…

Machine Learning · Computer Science 2022-11-07 Chao Yu , Akash Velu , Eugene Vinitsky , Jiaxuan Gao , Yu Wang , Alexandre Bayen , Yi Wu
‹ Prev 1 2 3 10 Next ›