Related papers: Deep Constrained Q-learning

I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety…

Machine Learning · Computer Science 2020-08-14 Mathieu Seurin , Philippe Preux , Olivier Pietquin

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to…

Machine Learning · Computer Science 2022-10-17 Ashish Kumar Jayant , Shalabh Bhatnagar

Lane Change Decision-making through Deep Reinforcement Learning with Rule-based Constraints

Autonomous driving decision-making is a great challenge due to the complexity and uncertainty of the traffic environment. Combined with the rule-based constraints, a Deep Q-Network (DQN) based method is applied for autonomous driving lane…

Robotics · Computer Science 2019-04-03 Junjie Wang , Qichao Zhang , Dongbin Zhao , Yaran Chen

Using Deep Q-Learning to Control Optimization Hyperparameters

We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs…

Optimization and Control · Mathematics 2016-06-21 Samantha Hansen

Deep Inverse Q-learning with Constraints

Popular Maximum Entropy Inverse Reinforcement Learning approaches require the computation of expected state visitation frequencies for the optimal policy under an estimate of the reward function. This usually requires intermediate value…

Machine Learning · Computer Science 2020-08-05 Gabriel Kalweit , Maria Huegle , Moritz Werling , Joschka Boedecker

Automatic Reward Shaping from Confounded Offline Data

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions…

Artificial Intelligence · Computer Science 2025-09-10 Mingxuan Li , Junzhe Zhang , Elias Bareinboim

Constraints Penalized Q-learning for Safe Offline Reinforcement Learning

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.…

Machine Learning · Computer Science 2022-04-11 Haoran Xu , Xianyuan Zhan , Xiangyu Zhu

Constrained Markov Decision Processes via Backward Value Functions

Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety…

Machine Learning · Computer Science 2020-08-28 Harsh Satija , Philip Amortila , Joelle Pineau

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially…

Robotics · Computer Science 2019-02-14 Steven Bohez , Abbas Abdolmaleki , Michael Neunert , Jonas Buchli , Nicolas Heess , Raia Hadsell

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding…

Machine Learning · Computer Science 2018-09-25 Tu-Hoa Pham , Giovanni De Magistris , Don Joven Agravante , Subhajit Chaudhury , Asim Munawar , Ryuki Tachibana

Research on Autonomous Driving Decision-making Strategies based Deep Reinforcement Learning

The behavior decision-making subsystem is a key component of the autonomous driving system, which reflects the decision-making ability of the vehicle and the driver, and is an important symbol of the high-level intelligence of the vehicle.…

Machine Learning · Computer Science 2024-12-31 Zixiang Wang , Hao Yan , Changsong Wei , Junyu Wang , Minheng Xiao

Automated Driving Maneuvers under Interactive Environment based on Deep Reinforcement Learning

Safe and efficient autonomous driving maneuvers in an interactive and complex environment can be considerably challenging due to the unpredictable actions of other surrounding agents that may be cooperative or adversarial in their…

Robotics · Computer Science 2019-01-28 Pin Wang , Ching-Yao Chan , Hanhan Li

Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include…

Machine Learning · Computer Science 2023-12-01 Jared Markowitz , Jesse Silverberg , Gary Collins

Deep Surrogate Q-Learning for Autonomous Driving

Challenging problems of deep reinforcement learning systems with regard to the application on real systems are their adaptivity to changing environments and their efficiency w.r.t. computational resources and data. In the application of…

Machine Learning · Computer Science 2022-02-18 Maria Kalweit , Gabriel Kalweit , Moritz Werling , Joschka Boedecker

Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values

While contemporary reinforcement learning research and applications have embraced policy gradient methods as the panacea of solving learning problems, value-based methods can still be useful in many domains as long as we can wrangle with…

Machine Learning · Computer Science 2024-07-16 Ashwin Ramaswamy , Ransalu Senanayake

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are…

Machine Learning · Computer Science 2022-01-04 Arun Raman , Keerthan Shagrithaya , Shalabh Bhatnagar

Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective

Given a set of trajectories demonstrating the execution of a task safely in a constrained MDP with observable rewards but with unknown constraints and non-observable costs, we aim to find a policy that maximizes the likelihood of…

Machine Learning · Computer Science 2026-03-02 George Papadopoulos , George A. Vouros

Constraint-Guided Reinforcement Learning: Augmenting the Agent-Environment-Interaction

Reinforcement Learning (RL) agents have great successes in solving tasks with large observation and action spaces from limited feedback. Still, training the agents is data-intensive and there are no guarantees that the learned behavior is…

Artificial Intelligence · Computer Science 2021-10-20 Helge Spieker

Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning

Dialogue policy learning based on reinforcement learning is difficult to be applied to real users to train dialogue agents from scratch because of the high cost. User simulators, which choose random user goals for the dialogue agent to…

Computation and Language · Computer Science 2020-12-29 Yangyang Zhao , Zhenyu Wang , Zhenhua Huang

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the…

Machine Learning · Computer Science 2021-01-29 Sobhan Miryoosefi , Kianté Brantley , Hal Daumé , Miroslav Dudik , Robert Schapire