Related papers: Projection-Based Constrained Policy Optimization

Constrained Policy Optimization

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Proactive Constrained Policy Optimization with Preemptive Penalty

Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific…

Machine Learning · Computer Science 2025-08-07 Ning Yang , Pengyu Wang , Guoqing Liu , Haifeng Zhang , Pin Lv , Jun Wang

Reward Constrained Policy Optimization

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right…

Machine Learning · Computer Science 2024-10-29 Jianmina Ma , Jingtian Ji , Yue Gao

Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning

Constrained Reinforcement Learning (RL) aims to maximize the return while adhering to predefined constraint limits, which represent domain-specific safety requirements. In continuous control settings, where learning agents govern system…

Machine Learning · Computer Science 2025-09-12 Somnath Hazra , Pallab Dasgupta , Soumyajit Dey

Constrained Proximal Policy Optimization

The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction…

Machine Learning · Computer Science 2023-05-24 Chengbin Xuan , Feng Zhang , Faliang Yin , Hak-Keung Lam

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on production and inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for…

Machine Learning · Computer Science 2023-02-06 Jaime Sabal Bermúdez , Antonio del Rio Chanona , Calvin Tsay

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint…

Machine Learning · Computer Science 2022-06-20 Linrui Zhang , Li Shen , Long Yang , Shixiang Chen , Bo Yuan , Xueqian Wang , Dacheng Tao

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Learning to Constrain Policy Optimization with Virtual Trust Region

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust…

Machine Learning · Computer Science 2022-09-19 Hung Le , Thommen Karimpanal George , Majid Abdolshah , Dung Nguyen , Kien Do , Sunil Gupta , Svetha Venkatesh

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

MC-CPO: Mastery-Conditioned Constrained Policy Optimization

Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learning outcomes, creating structural incentives for reward hacking in reinforcement learning policies. We formalize this challenge…

Artificial Intelligence · Computer Science 2026-04-07 Oluseyi Olukola , Nick Rahimi

A Logarithmic Barrier Method For Proximal Policy Optimization

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions…

Machine Learning · Computer Science 2018-12-18 Cheng Zeng , Hongming Zhang

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may…

Machine Learning · Computer Science 2021-07-13 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy rollout-based safety constraints that can be evaluated…

Machine Learning · Computer Science 2026-05-21 Shengfan Cao , Francesco Borrelli , Eunhyek Joa

A dynamical clipping approach with task feedback for Proximal Policy Optimization

Proximal Policy Optimization (PPO) has been broadly applied to robotics learning, showcasing stable training performance. However, the fixed clipping bound setting may limit the performance of PPO. Specifically, there is no theoretical…

Machine Learning · Computer Science 2024-11-07 Ziqi Zhang , Jingzehua Xu , Zifeng Zhuang , Hongyin Zhang , Jinxin Liu , Donglin wang , Shuai Zhang

Chance Constrained Policy Optimization for Process Control and Optimization

Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its…

Systems and Control · Electrical Eng. & Systems 2020-12-18 Panagiotis Petsagkourakis , Ilya Orson Sandoval , Eric Bradford , Federico Galvanin , Dongda Zhang , Ehecatl Antonio del Rio-Chanona

Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering

We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of…

Machine Learning · Computer Science 2024-07-24 Abhijeet Pendyala , Asma Atamna , Tobias Glasmachers

Efficient Exploration Using Extra Safety Budget in Constrained Policy Optimization

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency…

Robotics · Computer Science 2023-07-31 Haotian Xu , Shengjie Wang , Zhaolei Wang , Yunzhe Zhang , Qing Zhuo , Yang Gao , Tao Zhang