Related papers: COMBO: Conservative Offline Model-Based Policy Opt…

DROMO: Distributionally Robust Offline Model-based Policy Optimization

We consider the problem of offline reinforcement learning with model-based control, whose goal is to learn a dynamics model from the experience replay and obtain a pessimism-oriented agent under the learned model. Current model-based…

Machine Learning · Computer Science 2021-09-16 Ruizhen Liu , Dazhi Zhong , Zhicong Chen

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy…

Machine Learning · Computer Science 2022-10-12 Marc Rigter , Bruno Lacerda , Nick Hawes

CROP: Conservative Reward for Model-based Offline Policy Optimization

Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate…

Machine Learning · Computer Science 2026-04-14 Hao Li , Xiao-Hu Zhou , Shu-Hai Li , Mei-Jiang Gui , Xiao-Liang Xie , Shi-Qi Liu , Shuang-Yi Wang , Zhen-Qiu Feng , Zeng-Guang Hou

COSBO: Conservative Offline Simulation-Based Policy Optimization

Offline reinforcement learning allows training reinforcement learning models on data from live deployments. However, it is limited to choosing the best combination of behaviors present in the training data. In contrast, simulation…

Machine Learning · Computer Science 2024-09-24 Eshagh Kargar , Ville Kyrki

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since…

Machine Learning · Computer Science 2023-03-06 Jihwan Jeong , Xiaoyu Wang , Michael Gimelfarb , Hyunwoo Kim , Baher Abdulhai , Scott Sanner

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to optimize policy from a large pre-recorded dataset without interaction with the environment. This setting offers the promise of utilizing diverse, pre-collected datasets to…

Machine Learning · Computer Science 2021-01-05 Qiang He , Xinwen Hou

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning

Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-based approaches are particularly advantageous…

Machine Learning · Computer Science 2026-05-14 Xuyang Chen , Keyu Yan , Guojian Wang , Lin Zhao

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance…

Machine Learning · Computer Science 2019-12-02 Qi Zhou , Houqiang Li , Jie Wang

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

Current approaches to model-based offline reinforcement learning often incorporate uncertainty-based reward penalization to address the distributional shift problem. These approaches, commonly known as pessimistic value iteration, use Monte…

Machine Learning · Computer Science 2025-01-17 Abdullah Akgül , Manuel Haußmann , Melih Kandemir

SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets

Model-based offline reinforcement Learning (RL) is a promising approach that leverages existing data effectively in many real-world applications, especially those involving high-dimensional inputs like images and videos. To alleviate the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Shenghua Wan , Ziyuan Chen , Le Gan , Shuai Feng , De-Chuan Zhan

Bayesian Conservative Policy Optimization (BCPO): A Novel Uncertainty-Calibrated Offline Reinforcement Learning with Credible Lower Bounds

Offline reinforcement learning (RL) aims to learn decision policies from a fixed batch of logged transitions, without additional environment interaction. Despite remarkable empirical progress, offline RL remains fragile under distribution…

Methodology · Statistics 2026-03-16 Debashis Chatterjee

ROMO: Retrieval-enhanced Offline Model-based Optimization

Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static…

Machine Learning · Computer Science 2023-10-20 Mingcheng Chen , Haoran Zhao , Yuxiang Zhao , Hulei Fan , Hongqiao Gao , Yong Yu , Zheng Tian

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where existing off-policy actor-critic methods perform poorly due to the overestimation of out-of-distribution state-action pairs. Thus, various additional augmentations are…

Machine Learning · Computer Science 2023-02-23 Zifeng Zhuang , Kun Lei , Jinxin Liu , Donglin Wang , Yilang Guo

Conservative quantum offline model-based optimization

Offline model-based optimization (MBO) refers to the task of optimizing a black-box objective function using only a fixed set of prior input-output data, without any active experimentation. Recent work has introduced quantum extremal…

Quantum Physics · Physics 2026-05-06 Kristian Sotirov , Annie E. Paine , Savvas Varsamopoulos , Antonio A. Gentile , Osvaldo Simeone

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in…

Machine Learning · Computer Science 2025-01-07 Ruiquan Huang , Yingbin Liang , Jing Yang

Hallucinated Adversarial Control for Conservative Offline Policy Evaluation

We study the problem of conservative off-policy evaluation (COPE) where given an offline dataset of environment interactions, collected by other agents, we seek to obtain a (tight) lower bound on a policy's performance. This is crucial when…

Machine Learning · Computer Science 2023-05-29 Jonas Rothfuss , Bhavya Sukhija , Tobias Birchler , Parnian Kassraie , Andreas Krause

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

We study model-based offline Reinforcement Learning with general function approximation without a full coverage assumption on the offline data distribution. We present an algorithm named Constrained Pessimistic Policy Optimization…

Machine Learning · Computer Science 2023-01-11 Masatoshi Uehara , Wen Sun

Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model. While the dynamics model learned through reuse of the static dataset, its…

Machine Learning · Computer Science 2022-11-01 Kaiyang Guo , Yunfeng Shao , Yanhui Geng

Offline Reinforcement Learning for Mobility Robustness Optimization

In this work we revisit the Mobility Robustness Optimisation (MRO) algorithm and study the possibility of learning the optimal Cell Individual Offset tuning using offline Reinforcement Learning. Such methods make use of collected offline…

Networking and Internet Architecture · Computer Science 2025-07-01 Pegah Alizadeh , Anastasios Giovanidis , Pradeepa Ramachandra , Vasileios Koutsoukis , Osama Arouk