Related papers: Predictor-Corrector Policy Optimization

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm, Projection-Based Constrained Policy…

Machine Learning · Computer Science 2020-10-08 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics,…

Machine Learning · Computer Science 2026-05-15 Matias Alvo , Daniel Russo , Yash Kanoria

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model…

Machine Learning · Computer Science 2022-03-04 Lukas P. Fröhlich , Maksym Lefarov , Melanie N. Zeilinger , Felix Berkenkamp

Enhancing PPO with Trajectory-Aware Hybrid Policies

Proximal policy optimization (PPO) is one of the most popular state-of-the-art on-policy algorithms that has become a standard baseline in modern reinforcement learning with applications in numerous fields. Though it delivers stable…

Machine Learning · Computer Science 2025-02-25 Qisai Liu , Zhanhong Jiang , Hsin-Jung Yang , Mahsa Khosravi , Joshua R. Waite , Soumik Sarkar

Gradient Informed Proximal Policy Optimization

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we…

Machine Learning · Computer Science 2023-12-15 Sanghyun Son , Laura Yu Zheng , Ryan Sullivan , Yi-Ling Qiao , Ming C. Lin

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

Policy Improvement Reinforcement Learning

Reinforcement Learning with Verifiable Rewards (RLVR) has become a central post-training paradigm for improving the reasoning capabilities of large language models. Yet existing methods share a common blind spot: they optimize policies…

Machine Learning · Computer Science 2026-04-29 Huaiyang Wang , Xiaojie Li , Deqing Wang , Haoyi Zhou , Zixuan Huang , Yaodong Yang , Jianxin Li , Yikun Ban

Guided Policy Optimization under Partial Observability

Reinforcement Learning (RL) in partially observable environments poses significant challenges due to the complexity of learning under uncertainty. While additional information, such as that available in simulations, can enhance training,…

Machine Learning · Computer Science 2026-03-16 Yueheng Li , Guangming Xie , Zongqing Lu

POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization

Lead optimization in drug discovery requires efficiently navigating vast chemical space through iterative cycles to enhance molecular properties while preserving structural similarity to the original lead compound. Despite recent advances,…

Machine Learning · Computer Science 2025-09-29 Ziqing Wang , Yibo Wen , William Pattie , Xiao Luo , Weimin Wu , Jerry Yao-Chieh Hu , Abhishek Pandey , Han Liu , Kaize Ding

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Recent advancements in robot learning have used imitation learning with large models and extensive demonstrations to develop effective policies. However, these models are often limited by the quantity, quality, and diversity of…

Robotics · Computer Science 2024-12-19 Xiu Yuan , Tongzhou Mu , Stone Tao , Yunhao Fang , Mengke Zhang , Hao Su

Constrained Proximal Policy Optimization

The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction…

Machine Learning · Computer Science 2023-05-24 Chengbin Xuan , Feng Zhang , Faliang Yin , Hak-Keung Lam

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Learning Robust Controllers Via Probabilistic Model-Based Policy Search

Model-based Reinforcement Learning estimates the true environment through a world model in order to approximate the optimal policy. This family of algorithms usually benefits from better sample efficiency than their model-free counterparts.…

Machine Learning · Computer Science 2021-10-27 Valentin Charvet , Bjørn Sand Jensen , Roderick Murray-Smith

Proximal Policy Optimization and its Dynamic Version for Sequence Generation

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial…

Computation and Language · Computer Science 2018-08-27 Yi-Lin Tuan , Jinzhi Zhang , Yujia Li , Hung-yi Lee

Accelerating Imitation Learning with Predictive Models

Sample efficiency is critical in solving real-world reinforcement learning problems, where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of…

Machine Learning · Computer Science 2018-10-16 Ching-An Cheng , Xinyan Yan , Evangelos A. Theodorou , Byron Boots

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

While reinforcement learning has advanced the alignment of text-to-image (T2I) models, state-of-the-art policy gradient methods are still hampered by training instability and high variance, hindering convergence speed and compromising image…

Computer Vision and Pattern Recognition · Computer Science 2026-02-25 Jeongjae Lee , Jong Chul Ye

Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences.…

Machine Learning · Computer Science 2023-11-13 Jared Markowitz , Edward W. Staley

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

Natural Policy Gradients In Reinforcement Learning Explained

Traditional policy gradient methods are fundamentally flawed. Natural gradients converge quicker and better, forming the foundation of contemporary Reinforcement Learning such as Trust Region Policy Optimization (TRPO) and Proximal Policy…

Machine Learning · Computer Science 2022-09-07 W. J. A. van Heeswijk

Policy Prediction Network: Model-Free Behavior Policy with Model-Based Learning in Continuous Action Space

This paper proposes a novel deep reinforcement learning architecture that was inspired by previous tree structured architectures which were only useable in discrete action spaces. Policy Prediction Network offers a way to improve sample…

Machine Learning · Computer Science 2019-09-18 Zac Wellmer , James Kwok