Related papers: Meta-Model-Based Meta-Policy Optimization

M3PO: Massively Multi-Task Model-Based Policy Optimization

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task…

Machine Learning · Computer Science 2025-06-30 Aditya Narendra , Dmitry Makarov , Aleksandr Panov

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Model-Based Offline Meta-Reinforcement Learning with Regularization

Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these…

Machine Learning · Computer Science 2022-07-14 Sen Lin , Jialin Wan , Tengyu Xu , Yingbin Liang , Junshan Zhang

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited. This paper…

Machine Learning · Computer Science 2021-02-16 Yuping Luo , Huazhe Xu , Yuanzhi Li , Yuandong Tian , Trevor Darrell , Tengyu Ma

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a…

Machine Learning · Computer Science 2021-03-01 Baohe Zhang , Raghu Rajan , Luis Pineda , Nathan Lambert , André Biedenkapp , Kurtland Chua , Frank Hutter , Roberto Calandra

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional…

Machine Learning · Computer Science 2024-03-18 Zohar Rimon , Tom Jurgenson , Orr Krupnik , Gilad Adler , Aviv Tamar

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able…

Machine Learning · Computer Science 2022-09-28 Desik Rengarajan , Sapana Chaudhary , Jaewon Kim , Dileep Kalathil , Srinivas Shakkottai

Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization

Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL…

Robotics · Computer Science 2025-06-04 Yifan Sun , Feihan Li , Weiye Zhao , Rui Chen , Tianhao Wei , Changliu Liu

Meta-Reinforcement Learning with Universal Policy Adaptation: Provable Near-Optimality under All-task Optimum Comparator

Meta-reinforcement learning (Meta-RL) has attracted attention due to its capability to enhance reinforcement learning (RL) algorithms, in terms of data efficiency and generalizability. In this paper, we develop a bilevel optimization…

Machine Learning · Computer Science 2024-10-15 Siyuan Xu , Minghui Zhu

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

Constrained Meta Reinforcement Learning with Provable Test-Time Safety

Meta reinforcement learning (RL) allows agents to leverage experience across a distribution of tasks on which the agent can train at will, enabling faster learning of optimal policies on new test tasks. Despite its success in improving…

Machine Learning · Computer Science 2026-05-27 Tingting Ni , Maryam Kamgarpour

Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models

Reward-based alignment methods for large language models (LLMs) face two key limitations: vulnerability to reward hacking, where models exploit flaws in the reward signal; and reliance on brittle, labor-intensive prompt engineering when…

Computation and Language · Computer Science 2025-05-20 Zae Myung Kim , Chanwoo Park , Vipul Raheja , Suin Kim , Dongyeop Kang

GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization

Hyperparameter optimization (HPO) plays a critical role in improving model performance. Transformer-based HPO methods have shown great potential; however, existing approaches rely heavily on large-scale historical optimization trajectories…

Machine Learning · Computer Science 2025-09-23 Haoxin Guo , Jiawen Pan , Weixin Zhai

Safe Planning and Policy Optimization via World Model Learning

Reinforcement Learning (RL) applications in real-world scenarios must prioritize safety and reliability, which impose strict constraints on agent behavior. Model-based RL leverages predictive world models for action planning and policy…

Artificial Intelligence · Computer Science 2025-06-06 Artem Latyshev , Gregory Gorbov , Aleksandr I. Panov

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

Model-Ensemble Trust-Region Policy Optimization

Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains.…

Machine Learning · Computer Science 2018-10-08 Thanard Kurutach , Ignasi Clavera , Yan Duan , Aviv Tamar , Pieter Abbeel

When to Update Your Model: Constrained Model-based Reinforcement Learning

Designing and analyzing model-based RL (MBRL) algorithms with guaranteed monotonic improvement has been challenging, mainly due to the interdependence between policy optimization and model learning. Existing discrepancy bounds generally…

Machine Learning · Computer Science 2023-11-09 Tianying Ji , Yu Luo , Fuchun Sun , Mingxuan Jing , Fengxiang He , Wenbing Huang

Discovered Policy Optimisation

Tremendous progress has been made in reinforcement learning (RL) over the past decade. Most of these advancements came through the continual development of new algorithms, which were designed using a combination of mathematical derivations,…

Machine Learning · Computer Science 2022-10-14 Chris Lu , Jakub Grudzien Kuba , Alistair Letcher , Luke Metz , Christian Schroeder de Witt , Jakob Foerster

On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies

Modern meta-reinforcement learning (Meta-RL) methods are mainly developed based on model-agnostic meta-learning, which performs policy gradient steps across tasks to maximize policy performance. However, the gradient conflict problem is…

Artificial Intelligence · Computer Science 2022-09-22 Haozhi Wang , Qing Wang , Yunfeng Shao , Dong Li , Jianye Hao , Yinchuan Li