Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Akshay Mete; Shahid Aamir Sheikh; Tzu-Hsiang Lin; Dileep Kalathil; P. R. Kumar

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Machine Learning 2026-02-11 v1 Artificial Intelligence Systems and Control Systems and Control

Authors: Akshay Mete , Shahid Aamir Sheikh , Tzu-Hsiang Lin , Dileep Kalathil , P. R. Kumar

Abstract

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.

Keywords

world model reinforcement learning extreme learning machine

Cite

@article{arxiv.2602.10044,
  title  = {Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning},
  author = {Akshay Mete and Shahid Aamir Sheikh and Tzu-Hsiang Lin and Dileep Kalathil and P. R. Kumar},
  journal= {arXiv preprint arXiv:2602.10044},
  year   = {2026}
}

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Abstract

Keywords

Cite

Related papers