Related papers: Constrained Policy Optimization via Bayesian World…

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method (RoMBRL) that can capture model uncertainty to achieve sample-efficient policy optimisation. We propose to formulate the model-based policy optimisation problem as a…

Robotics · Computer Science 2021-01-06 Tai Hoang , Ngo Anh Vien

Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process…

Machine Learning · Computer Science 2021-03-03 Aria HasanzadeZonuzy , Archana Bura , Dileep Kalathil , Srinivas Shakkottai

Model-based Policy Optimization using Symbolic World Model

The application of learning-based control methods in robotics presents significant challenges. One is that model-free reinforcement learning algorithms use observation data with low sample efficiency. To address this challenge, a prevalent…

Machine Learning · Computer Science 2024-07-19 Andrey Gorodetskiy , Konstantin Mironov , Aleksandr Panov

Safe Planning and Policy Optimization via World Model Learning

Reinforcement Learning (RL) applications in real-world scenarios must prioritize safety and reliability, which impose strict constraints on agent behavior. Model-based RL leverages predictive world models for action planning and policy…

Artificial Intelligence · Computer Science 2025-06-06 Artem Latyshev , Gregory Gorbov , Aleksandr I. Panov

Constrained Markov Decision Processes via Backward Value Functions

Although Reinforcement Learning (RL) algorithms have found tremendous success in simulated domains, they often cannot directly be applied to physical systems, especially in cases where there are hard constraints to satisfy (e.g. on safety…

Machine Learning · Computer Science 2020-08-28 Harsh Satija , Philip Amortila , Joelle Pineau

Safety-Constrained Policy Transfer with Successor Features

In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications…

Machine Learning · Computer Science 2022-11-11 Zeyu Feng , Bowen Zhang , Jianxin Bi , Harold Soh

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement learning (RL) algorithms with respect to model uncertainties. Indeed, in the framework of model-based RL, we propose to merge the theory of constrained Markov decision…

Machine Learning · Computer Science 2020-10-13 Reazul Hasan Russel , Mouhacine Benosman , Jeroen Van Baar

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially…

Robotics · Computer Science 2019-02-14 Steven Bohez , Abbas Abdolmaleki , Michael Neunert , Jonas Buchli , Nicolas Heess , Raia Hadsell

Bayesian Policy Optimization for Model Uncertainty

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a…

Robotics · Computer Science 2019-05-09 Gilwoo Lee , Brian Hou , Aditya Mandalika , Jeongseok Lee , Sanjiban Choudhury , Siddhartha S. Srinivasa

Optimal Policy Minimum Bayesian Risk

Inference scaling helps LLMs solve complex reasoning problems through extended runtime computation. On top of long chain-of-thought (long-CoT) models, purely inference-time techniques such as best-of-N (BoN) sampling, majority voting, or…

Machine Learning · Computer Science 2025-10-08 Ramón Fernandez Astudillo , Md Arafat Sultan , Aashka Trivedi , Yousef El-Kurdi , Tahira Naseem , Radu Florian , Salim Roukos

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to…

Machine Learning · Computer Science 2022-10-17 Ashish Kumar Jayant , Shalabh Bhatnagar

MAMBPO: Sample-efficient multi-robot reinforcement learning using learned world models

Multi-robot systems can benefit from reinforcement learning (RL) algorithms that learn behaviours in a small number of trials, a property known as sample efficiency. This research thus investigates the use of learned world models to improve…

Robotics · Computer Science 2021-03-08 Daniël Willemsen , Mario Coppola , Guido C. H. E. de Croon

Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the…

Artificial Intelligence · Computer Science 2024-07-18 Detian Chu , Linyuan Bai , Jianuo Huang , Zhenlong Fang , Peng Zhang , Wei Kang , Haifeng Lin

Safe Continuous Control with Constrained Model-Based Policy Optimization

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast…

Machine Learning · Computer Science 2021-04-15 Moritz A. Zanger , Karam Daaboul , J. Marius Zöllner

Constrained Meta Reinforcement Learning with Provable Test-Time Safety

Meta reinforcement learning (RL) allows agents to leverage experience across a distribution of tasks on which the agent can train at will, enabling faster learning of optimal policies on new test tasks. Despite its success in improving…

Machine Learning · Computer Science 2026-05-27 Tingting Ni , Maryam Kamgarpour

Constrained Style Learning from Imperfect Demonstrations under Task Optimality

Learning from demonstration has proven effective in robotics for acquiring natural behaviors, such as stylistic motions and lifelike agility, particularly when explicitly defining style-oriented reward functions is challenging. Synthesizing…

Robotics · Computer Science 2025-09-24 Kehan Wen , Chenhao Li , Junzhe He , Marco Hutter

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance…

Artificial Intelligence · Computer Science 2018-06-26 Daniel S. Brown , Scott Niekum

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the…

Machine Learning · Computer Science 2024-03-26 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low…

Machine Learning · Computer Science 2021-03-17 Baiyu Peng , Yao Mu , Yang Guan , Shengbo Eben Li , Yuming Yin , Jianyu Chen