English
Related papers

Related papers: Projection-Based Constrained Policy Optimization

200 papers

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Safe Reinforcement Learning (RL) often faces significant issues such as constraint violations and instability, necessitating the use of constrained policy optimization, which seeks optimal policies while ensuring adherence to specific…

Machine Learning · Computer Science 2025-08-07 Ning Yang , Pengyu Wang , Guoqing Liu , Haifeng Zhang , Pin Lv , Jun Wang

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right…

Machine Learning · Computer Science 2024-10-29 Jianmina Ma , Jingtian Ji , Yue Gao

Constrained Reinforcement Learning (RL) aims to maximize the return while adhering to predefined constraint limits, which represent domain-specific safety requirements. In continuous control settings, where learning agents govern system…

Machine Learning · Computer Science 2025-09-12 Somnath Hazra , Pallab Dasgupta , Soumyajit Dey

The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction…

Machine Learning · Computer Science 2023-05-24 Chengbin Xuan , Feng Zhang , Faliang Yin , Hak-Keung Lam

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on production and inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for…

Machine Learning · Computer Science 2023-02-06 Jaime Sabal Bermúdez , Antonio del Rio Chanona , Calvin Tsay

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint…

Machine Learning · Computer Science 2022-06-20 Linrui Zhang , Li Shen , Long Yang , Shixiang Chen , Bo Yuan , Xueqian Wang , Dacheng Tao

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust…

Machine Learning · Computer Science 2022-09-19 Hung Le , Thommen Karimpanal George , Majid Abdolshah , Dung Nguyen , Kien Do , Sunil Gupta , Svetha Venkatesh

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learning outcomes, creating structural incentives for reward hacking in reinforcement learning policies. We formalize this challenge…

Artificial Intelligence · Computer Science 2026-04-07 Oluseyi Olukola , Nick Rahimi

Proximal policy optimization(PPO) has been proposed as a first-order optimization method for reinforcement learning. We should notice that an exterior penalty method is used in it. Often, the minimizers of the exterior penalty functions…

Machine Learning · Computer Science 2018-12-18 Cheng Zeng , Hongming Zhang

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may…

Machine Learning · Computer Science 2021-07-13 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy rollout-based safety constraints that can be evaluated…

Machine Learning · Computer Science 2026-05-21 Shengfan Cao , Francesco Borrelli , Eunhyek Joa

Proximal Policy Optimization (PPO) has been broadly applied to robotics learning, showcasing stable training performance. However, the fixed clipping bound setting may limit the performance of PPO. Specifically, there is no theoretical…

Machine Learning · Computer Science 2024-11-07 Ziqi Zhang , Jingzehua Xu , Zifeng Zhuang , Hongyin Zhang , Jinxin Liu , Donglin wang , Shuai Zhang

Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its…

We present a proximal policy optimization (PPO) agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of…

Machine Learning · Computer Science 2024-07-24 Abhijeet Pendyala , Asma Atamna , Tobias Glasmachers

Reinforcement learning (RL) has achieved promising results on most robotic control tasks. Safety of learning-based controllers is an essential notion of ensuring the effectiveness of the controllers. Current methods adopt whole consistency…

Robotics · Computer Science 2023-07-31 Haotian Xu , Shengjie Wang , Zhaolei Wang , Yunzhe Zhang , Qing Zhuo , Yang Gao , Tao Zhang
‹ Prev 1 2 3 10 Next ›