Related papers: Model-based Policy Optimization with Unsupervised …

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

Bidirectional Model-based Policy Optimization

Model-based reinforcement learning approaches leverage a forward dynamics model to support planning and decision making, which, however, may fail catastrophically if the model is inaccurate. Although there are several existing methods…

Machine Learning · Computer Science 2020-09-30 Hang Lai , Jian Shen , Weinan Zhang , Yong Yu

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such…

Machine Learning · Computer Science 2020-06-25 Yinlam Chow , Brandon Cui , MoonKyung Ryu , Mohammad Ghavamzadeh

Uncertainty-aware Model-based Policy Optimization

Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance…

Machine Learning · Computer Science 2019-06-27 Tung-Long Vuong , Kenneth Tran

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model…

Machine Learning · Computer Science 2022-03-04 Lukas P. Fröhlich , Maksym Lefarov , Melanie N. Zeilinger , Felix Berkenkamp

DROMO: Distributionally Robust Offline Model-based Policy Optimization

We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based…

Machine Learning · Computer Science 2021-09-16 Ruizhen Liu , Dazhi Zhong , Zhicong Chen

Mismatched No More: Joint Model-Policy Optimization for Model-Based RL

Many model-based reinforcement learning (RL) methods follow a similar template: fit a model to previously observed data, and then use data from that model for RL or planning. However, models that achieve better training performance (e.g.,…

Machine Learning · Computer Science 2023-02-21 Benjamin Eysenbach , Alexander Khazatsky , Sergey Levine , Ruslan Salakhutdinov

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and…

Machine Learning · Computer Science 2026-05-27 Xiaoyuan Cheng , Wenxuan Yuan , Zhancun Mu , Yuanzhao Zhang , Yiming Yang , Hai Wang , Zhuo Sun , Che Liu

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy…

Machine Learning · Computer Science 2021-11-30 Michael Janner , Justin Fu , Marvin Zhang , Sergey Levine

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper…

Machine Learning · Computer Science 2022-03-18 Weinan Zhang , Xihuai Wang , Jian Shen , Ming Zhou

Model predictive control-based value estimation for efficient reinforcement learning

Reinforcement learning suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal…

Machine Learning · Computer Science 2024-10-28 Qizhen Wu , Kexin Liu , Lei Chen

On Effective Scheduling of Model-based Reinforcement Learning

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve…

Machine Learning · Computer Science 2022-07-06 Hang Lai , Jian Shen , Weinan Zhang , Yimin Huang , Xing Zhang , Ruiming Tang , Yong Yu , Zhenguo Li

Efficient Model-Based Reinforcement Learning for Robot Control via Online Optimization

We present an online model-based reinforcement learning algorithm suitable for controlling complex robotic systems directly in the real world. Unlike prevailing sim-to-real pipelines that rely on extensive offline simulation and model-free…

Robotics · Computer Science 2026-05-07 Fang Nan , Hao Ma , Qinghua Guan , Josie Hughes , Michael Muehlebach , Marco Hutter

Provably Efficient Model-based Policy Adaptation

The high sample complexity of reinforcement learning challenges its use in practice. A promising approach is to quickly adapt pre-trained policies to new environments. Existing methods for this policy adaptation problem typically rely on…

Machine Learning · Computer Science 2020-06-16 Yuda Song , Aditi Mavalankar , Wen Sun , Sicun Gao

Safe Planning and Policy Optimization via World Model Learning

Reinforcement Learning (RL) applications in real-world scenarios must prioritize safety and reliability, which impose strict constraints on agent behavior. Model-based RL leverages predictive world models for action planning and policy…

Artificial Intelligence · Computer Science 2025-06-06 Artem Latyshev , Gregory Gorbov , Aleksandr I. Panov

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Learning Powerful Policies by Using Consistent Dynamics Model

Model-based Reinforcement Learning approaches have the promise of being sample efficient. Much of the progress in learning dynamics models in RL has been made by learning models via supervised learning. But traditional model-based…

Machine Learning · Computer Science 2019-06-12 Shagun Sodhani , Anirudh Goyal , Tristan Deleu , Yoshua Bengio , Sergey Levine , Jian Tang

Model-Free Imitation Learning with Policy Optimization

In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or…

Machine Learning · Computer Science 2016-06-17 Jonathan Ho , Jayesh K. Gupta , Stefano Ermon

Dual Alignment Maximin Optimization for Offline Model-based RL

Offline reinforcement learning agents face significant deployment challenges due to the synthetic-to-real distribution mismatch. While most prior research has focused on improving the fidelity of synthetic sampling and incorporating…

Machine Learning · Computer Science 2025-10-01 Chi Zhou , Wang Luo , Haoran Li , Congying Han , Tiande Guo , Zicheng Zhang