Related papers: Optimistic World Models: Efficient Exploration in …

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic…

Machine Learning · Computer Science 2025-03-12 Jasmine Bayrooti , Carl Henrik Ek , Amanda Prorok

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty.…

Machine Learning · Computer Science 2020-12-02 Sebastian Curi , Felix Berkenkamp , Andreas Krause

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Optimistic Simulated Exploration as an Incentive for Real Exploration

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an…

Machine Learning · Computer Science 2009-05-20 Ivo Danihelka

SOMBRL: Scalable and Optimistic Model-Based RL

We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL…

Machine Learning · Computer Science 2025-11-26 Bhavya Sukhija , Lenart Treven , Carmelo Sferrazza , Florian Dörfler , Pieter Abbeel , Andreas Krause

Optimistic Exploration even with a Pessimistic Initialisation

Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use…

Machine Learning · Computer Science 2020-02-28 Tabish Rashid , Bei Peng , Wendelin Böhmer , Shimon Whiteson

PWM: Policy Learning with Multi-Task World Models

Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on…

Machine Learning · Computer Science 2025-02-25 Ignat Georgiev , Varun Giridhar , Nicklas Hansen , Animesh Garg

When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning

Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of…

Machine Learning · Computer Science 2024-03-01 Siliang Zeng , Chenliang Li , Alfredo Garcia , Mingyi Hong

Principled Exploration via Optimistic Bootstrapping and Backward Induction

One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with…

Machine Learning · Computer Science 2021-05-18 Chenjia Bai , Lingxiao Wang , Lei Han , Jianye Hao , Animesh Garg , Peng Liu , Zhaoran Wang

Towards Tractable Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

The many faces of optimism - Extended version

The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods.…

Artificial Intelligence · Computer Science 2008-10-21 István Szita , András Lőrincz

Sample-efficient and Scalable Exploration in Continuous-Time RL

Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement…

Machine Learning · Computer Science 2026-03-03 Klemens Iten , Lenart Treven , Bhavya Sukhija , Florian Dörfler , Andreas Krause

Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove…

Machine Learning · Computer Science 2022-11-24 Qinghua Liu , Praneeth Netrapalli , Csaba Szepesvári , Chi Jin

Outcome-based Exploration for LLM Reasoning

Reinforcement learning (RL) has emerged as a powerful method for improving the reasoning abilities of large language models (LLMs). Outcome-based RL, which rewards policies solely for the correctness of the final answer, yields substantial…

Machine Learning · Computer Science 2025-09-09 Yuda Song , Julia Kempe , Remi Munos

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this…

Machine Learning · Computer Science 2021-12-14 Tim Seyde , Wilko Schwarting , Sertac Karaman , Daniela Rus

Safe Continuous Control with Constrained Model-Based Policy Optimization

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast…

Machine Learning · Computer Science 2021-04-15 Moritz A. Zanger , Karam Daaboul , J. Marius Zöllner

Implicit Generative Modeling for Efficient Exploration

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic"…

Machine Learning · Computer Science 2020-07-16 Neale Ratzlaff , Qinxun Bai , Li Fuxin , Wei Xu

Optimistic Task Inference for Behavior Foundation Models

Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient…

Machine Learning · Computer Science 2026-03-03 Thomas Rupf , Marco Bagatella , Marin Vlastelica , Andreas Krause

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Efficient Exploration in Continuous-time Model-based Reinforcement Learning

Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents…

Machine Learning · Computer Science 2023-11-01 Lenart Treven , Jonas Hübotter , Bhavya Sukhija , Florian Dörfler , Andreas Krause