Related papers: Variational Model-based Policy Optimization

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Model-based Policy Optimization with Unsupervised Model Adaptation

Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between…

Machine Learning · Computer Science 2020-10-29 Jian Shen , Han Zhao , Weinan Zhang , Yong Yu

Bidirectional Model-based Policy Optimization

Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making, which, however, may fail catastrophically if the model is inaccurate. Although there are several existing methods…

Machine Learning · Computer Science 2020-09-30 Hang Lai , Jian Shen , Weinan Zhang , Yong Yu

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality…

Machine Learning · Computer Science 2022-06-20 Zuxin Liu , Zhepeng Cen , Vladislav Isenbaev , Wei Liu , Zhiwei Steven Wu , Bo Li , Ding Zhao

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Model-based Reinforcement Learning with Multi-step Plan Value Estimation

A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a…

Machine Learning · Computer Science 2022-09-14 Haoxin Lin , Yihao Sun , Jiaji Zhang , Yang Yu

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and…

Machine Learning · Computer Science 2026-05-27 Xiaoyuan Cheng , Wenxuan Yuan , Zhancun Mu , Yuanzhao Zhang , Yiming Yang , Hai Wang , Zhuo Sun , Che Liu

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Vision-Language-Action (VLA) models have shown strong potential for general-purpose robotic manipulation, but their reliance on expert demonstrations limits their ability to learn from failures and perform self-corrections. Reinforcement…

Robotics · Computer Science 2025-11-13 Fangqi Zhu , Zhengyang Yan , Zicong Hong , Quanxin Shou , Xiao Ma , Song Guo

Model-Ensemble Trust-Region Policy Optimization

Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains.…

Machine Learning · Computer Science 2018-10-08 Thanard Kurutach , Ignasi Clavera , Yan Duan , Aviv Tamar , Pieter Abbeel

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization

Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality. It has been verified that utilizing diffusion policies can significantly improve the performance of RL…

Machine Learning · Computer Science 2024-12-17 Shutong Ding , Ke Hu , Zhenhao Zhang , Kan Ren , Weinan Zhang , Jingyi Yu , Jingya Wang , Ye Shi

M3PO: Massively Multi-Task Model-Based Policy Optimization

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task…

Machine Learning · Computer Science 2025-06-30 Aditya Narendra , Dmitry Makarov , Aleksandr Panov

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization…

Machine Learning · Computer Science 2024-10-15 Siyuan Xu , Minghui Zhu

Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization

In reinforcement learning (RL), an autonomous agent learns to perform complex tasks by maximizing an exogenous reward signal while interacting with its environment. In real-world applications, test conditions may differ substantially from…

Robotics · Computer Science 2019-10-30 Matteo Turchetta , Andreas Krause , Sebastian Trimpe

SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets

Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Shenghua Wan , Ziyuan Chen , Le Gan , Shuai Feng , De-Chuan Zhan

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior…

Machine Learning · Computer Science 2023-10-25 Hai Zhang , Hang Yu , Junqiao Zhao , Di Zhang , Chang Huang , Hongtu Zhou , Xiao Zhang , Chen Ye

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy…

Machine Learning · Computer Science 2021-11-30 Michael Janner , Justin Fu , Marvin Zhang , Sergey Levine

BRPO: Batch Residual Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum…

Machine Learning · Computer Science 2020-03-31 Sungryull Sohn , Yinlam Chow , Jayden Ooi , Ofir Nachum , Honglak Lee , Ed Chi , Craig Boutilier

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critical for robots that perform physical tasks alongside people. We formulate this as Bayesian Reinforcement Learning over latent Markov Decision Processes (MDPs). While…

Robotics · Computer Science 2020-02-11 Gilwoo Lee , Brian Hou , Sanjiban Choudhury , Siddhartha S. Srinivasa

Model Embedding Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL). Despite the impressive results it achieves, it still faces a trade-off between the ease of data…

Machine Learning · Computer Science 2020-06-17 Xiaoyu Tan , Chao Qu , Junwu Xiong , James Zhang