Related papers: Bidirectional Model-based Policy Optimization

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Model-based Policy Optimization with Unsupervised Model Adaptation

Model-based reinforcement learning methods learn a dynamics model with real data sampled from the environment and leverage it to generate simulated data to derive an agent. However, due to the potential distribution mismatch between…

Machine Learning · Computer Science 2020-10-29 Jian Shen , Han Zhao , Weinan Zhang , Yong Yu

Variational Model-based Policy Optimization

Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL. However, designing such…

Machine Learning · Computer Science 2020-06-25 Yinlam Chow , Brandon Cui , MoonKyung Ryu , Mohammad Ghavamzadeh

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly…

Machine Learning · Computer Science 2022-09-05 Yali Du , Chengdong Ma , Yuchen Liu , Runji Lin , Hao Dong , Jun Wang , Yaodong Yang

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Model-based reinforcement learning (RL) can be effectively supported at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and…

Machine Learning · Computer Science 2026-05-27 Xiaoyuan Cheng , Wenxuan Yuan , Zhancun Mu , Yuanzhao Zhang , Yiming Yang , Hai Wang , Zhuo Sun , Che Liu

Towards Causal Model-Based Policy Optimization

Real-world decision-making problems are often marked by complex, uncertain dynamics that can shift or break under changing conditions. Traditional Model-Based Reinforcement Learning (MBRL) approaches learn predictive models of environment…

Machine Learning · Computer Science 2025-03-14 Alberto Caron , Vasilios Mavroudis , Chris Hicks

When to Trust Your Model: Model-Based Policy Optimization

Designing effective model-based reinforcement learning algorithms is difficult because the ease of data generation must be weighed against the bias of model-generated data. In this paper, we study the role of model usage in policy…

Machine Learning · Computer Science 2021-11-30 Michael Janner , Justin Fu , Marvin Zhang , Sergey Levine

Maximum a Posteriori Policy Optimisation

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our…

Machine Learning · Computer Science 2018-06-25 Abbas Abdolmaleki , Jost Tobias Springenberg , Yuval Tassa , Remi Munos , Nicolas Heess , Martin Riedmiller

Reflective Policy Optimization

On-policy reinforcement learning methods, like Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), often demand extensive data per update, leading to sample inefficiency. This paper introduces Reflective Policy…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Zhe Wu , Junliang Xing

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior…

Machine Learning · Computer Science 2023-10-25 Hai Zhang , Hang Yu , Junqiao Zhao , Di Zhang , Chang Huang , Hongtu Zhou , Xiao Zhang , Chen Ye

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance…

Machine Learning · Computer Science 2019-12-02 Qi Zhou , Houqiang Li , Jie Wang

M3PO: Massively Multi-Task Model-Based Policy Optimization

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task…

Machine Learning · Computer Science 2025-06-30 Aditya Narendra , Dmitry Makarov , Aleksandr Panov

On-Policy Model Errors in Reinforcement Learning

Model-free reinforcement learning algorithms can compute policy gradients given sampled environment transitions, but require large amounts of data. In contrast, model-based methods can use the learned model to generate new data, but model…

Machine Learning · Computer Science 2022-03-04 Lukas P. Fröhlich , Maksym Lefarov , Melanie N. Zeilinger , Felix Berkenkamp

Uncertainty-aware Model-based Policy Optimization

Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance…

Machine Learning · Computer Science 2019-06-27 Tung-Long Vuong , Kenneth Tran

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

MAPO: Mixed Advantage Policy Optimization

Recent advances in reinforcement learning for foundation models, such as Group Relative Policy Optimization (GRPO), have significantly improved the performance of foundation models on reasoning tasks. Notably, the advantage function serves…

Artificial Intelligence · Computer Science 2025-09-26 Wenke Huang , Quan Zhang , Yiyang Fang , Jian Liang , Xuankun Rong , Huanjin Yao , Guancheng Wan , Ke Liang , Wenwen He , Mingjun Li , Leszek Rutkowski , Mang Ye , Bo Du , Dacheng Tao

Bounded Ratio Reinforcement Learning

Proximal Policy Optimization (PPO) has become the predominant algorithm for on-policy reinforcement learning due to its scalability and empirical robustness across domains. However, there is a significant disconnect between the underlying…

Machine Learning · Computer Science 2026-05-01 Yunke Ao , Le Chen , Bruce D. Lee , Assefa S. Wahd , Aline Czarnobai , Philipp Fürnstahl , Bernhard Schölkopf , Andreas Krause

A reinforcement learning approach to hybrid control design

In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov…

Systems and Control · Electrical Eng. & Systems 2020-09-03 Meet Gandhi , Atreyee Kundu , Shalabh Bhatnagar

ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm

Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning models remains challenging. As one of the most…

Machine Learning · Computer Science 2026-02-11 Hanyong Wang , Menglong Yang