Related papers: Safe Value Functions

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

Reinforcement learning has emerged as an important approach for autonomous driving. A reward function is used in reinforcement learning to establish the learned skill objectives and guide the agent toward the optimal policy. Since…

Robotics · Computer Science 2026-03-05 Ahmed Abouelazm , Jonas Michel , J. Marius Zoellner

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding…

Machine Learning · Computer Science 2018-09-25 Tu-Hoa Pham , Giovanni De Magistris , Don Joven Agravante , Subhajit Chaudhury , Asim Munawar , Ryuki Tachibana

Value Functions are Control Barrier Functions: Verification of Safe Policies using Control Theory

Guaranteeing safe behaviour of reinforcement learning (RL) policies poses significant challenges for safety-critical applications, despite RL's generality and scalability. To address this, we propose a new approach to apply verification…

Machine Learning · Computer Science 2023-12-06 Daniel C. H. Tan , Fernando Acero , Robert McCarthy , Dimitrios Kanoulas , Zhibin Li

Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards

Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints, as exceeding safety violation limits can result in severe consequences. In this…

Machine Learning · Computer Science 2025-04-07 Hanping Zhang , Yuhong Guo

ROSARL: Reward-Only Safe Reinforcement Learning

An important problem in reinforcement learning is designing agents that learn to solve tasks safely in an environment. A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when…

Machine Learning · Computer Science 2023-06-02 Geraud Nangue Tasse , Tamlin Love , Mark Nemecek , Steven James , Benjamin Rosman

Combining Automated Optimisation of Hyperparameters and Reward Shape

There has been significant progress in deep reinforcement learning (RL) in recent years. Nevertheless, finding suitable hyperparameter configurations and reward functions remains challenging even for experts, and performance heavily relies…

Machine Learning · Computer Science 2024-10-10 Julian Dierkes , Emma Cramer , Holger H. Hoos , Sebastian Trimpe

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may…

Machine Learning · Computer Science 2021-07-13 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Convergence of the Value Function in Optimal Control Problems with Unknown Dynamics

We deal with the convergence of the value function of an approximate control problem with uncertain dynamics to the value function of a nonlinear optimal control problem. The assumptions on the dynamics and the costs are rather general and…

Optimization and Control · Mathematics 2021-05-31 Andrea Pesare , Michele Palladino , Maurizio Falcone

On the continuity and smoothness of the value function in reinforcement learning and optimal control

The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states…

Systems and Control · Electrical Eng. & Systems 2024-03-22 Hans Harder , Sebastian Peitz

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the…

Machine Learning · Computer Science 2025-02-25 Grigorii Veviurko , Wendelin Böhmer , Mathijs de Weerdt

Breaking the Safety-Capability Tradeoff: Reinforcement Learning with Verifiable Rewards Maintains Safety Guardrails in LLMs

Fine-tuning large language models (LLMs) for downstream tasks typically exhibit a fundamental safety-capability tradeoff, where improving task performance degrades safety alignment even on benign datasets. This degradation persists across…

Machine Learning · Computer Science 2025-11-27 Dongkyu Derek Cho , Huan Song , Arijit Ghosh Chowdhury , Haotian An , Yawei Wang , Rohit Thekkanal , Negin Sokhandan , Sharlina Keshava , Hannah Marlowe

Counterfactually Safe Reinforcement Learning

Reinforcement learning algorithms are generally designed to maximize the expected return across a population. However, a policy that is optimal on average may be suboptimal for certain individuals, leading to potential safety concerns. To…

Machine Learning · Statistics 2026-05-26 Jingyi Li , Peng Wu , Chengchun Shi

Behavior Alignment via Reward Function Optimization

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that…

Machine Learning · Computer Science 2023-11-01 Dhawal Gupta , Yash Chandak , Scott M. Jordan , Philip S. Thomas , Bruno Castro da Silva

Cutting Your Losses: Learning Fault-Tolerant Control and Optimal Stopping under Adverse Risk

Recently, there has been a surge in interest in safe and robust techniques within reinforcement learning (RL). Current notions of risk in RL fail to capture the potential for systemic failures such as abrupt stoppages from system failures…

Systems and Control · Computer Science 2019-10-09 David Mguni

Learning-Enhanced Safeguard Control for High-Relative-Degree Systems: Robust Optimization under Disturbances and Faults

Merely pursuing performance may adversely affect the safety, while a conservative policy for safe exploration will degrade the performance. How to balance the safety and performance in learning-based control problems is an interesting yet…

Systems and Control · Electrical Eng. & Systems 2025-01-28 Xinyang Wang , Hongwei Zhang , Shimin Wang , Wei Xiao , Martin Guay

Approximate Optimal Control for Safety-Critical Systems with Control Barrier Functions

Control Barrier Functions (CBFs) have become a popular tool for enforcing set invariance in safety-critical control systems. While guaranteeing safety, most CBF approaches are myopic in the sense that they solve an optimization problem at…

Systems and Control · Electrical Eng. & Systems 2020-08-11 Max Cohen , Calin Belta

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as to whether it is worthwhile to carry on with the learning process or to terminate it. In many such situations, stochastic elements are often present which govern the…

Machine Learning · Computer Science 2019-02-13 Nikki Lijing Kuang , Clement H. C. Leung

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the…

Machine Learning · Computer Science 2021-01-29 Sobhan Miryoosefi , Kianté Brantley , Hal Daumé , Miroslav Dudik , Robert Schapire

A Survey of Constraint Formulations in Safe Reinforcement Learning

Safety is critical when applying reinforcement learning (RL) to real-world problems. As a result, safe RL has emerged as a fundamental and powerful paradigm for optimizing an agent's policy while incorporating notions of safety. A prevalent…

Machine Learning · Computer Science 2024-05-09 Akifumi Wachi , Xun Shen , Yanan Sui

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not…

Machine Learning · Computer Science 2023-03-03 Zuxin Liu , Zijian Guo , Zhepeng Cen , Huan Zhang , Jie Tan , Bo Li , Ding Zhao