English
Related papers

Related papers: Optimistic World Models: Efficient Exploration in …

200 papers

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic…

Machine Learning · Computer Science 2025-03-12 Jasmine Bayrooti , Carl Henrik Ek , Amanda Prorok

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty.…

Machine Learning · Computer Science 2020-12-02 Sebastian Curi , Felix Berkenkamp , Andreas Krause

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an…

Machine Learning · Computer Science 2009-05-20 Ivo Danihelka

We address the challenge of efficient exploration in model-based reinforcement learning (MBRL), where the system dynamics are unknown and the RL agent must learn directly from online interactions. We propose Scalable and Optimistic MBRL…

Machine Learning · Computer Science 2025-11-26 Bhavya Sukhija , Lenart Treven , Carmelo Sferrazza , Florian Dörfler , Pieter Abbeel , Andreas Krause

Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use…

Machine Learning · Computer Science 2020-02-28 Tabish Rashid , Bei Peng , Wendelin Böhmer , Shimon Whiteson

Reinforcement Learning (RL) has made significant strides in complex tasks but struggles in multi-task settings with different embodiments. World model methods offer scalability by learning a simulation of the environment but often rely on…

Machine Learning · Computer Science 2025-02-25 Ignat Georgiev , Varun Giridhar , Nicklas Hansen , Animesh Garg

Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of…

Machine Learning · Computer Science 2024-03-01 Siliang Zeng , Chenliang Li , Alfredo Garcia , Mingyi Hong

One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with…

Machine Learning · Computer Science 2021-05-18 Chenjia Bai , Lingxiao Wang , Lei Han , Jianye Hao , Animesh Garg , Peng Liu , Zhaoran Wang

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods.…

Artificial Intelligence · Computer Science 2008-10-21 István Szita , András Lőrincz

Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement…

Machine Learning · Computer Science 2026-03-03 Klemens Iten , Lenart Treven , Bhavya Sukhija , Florian Dörfler , Andreas Krause

This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove…

Machine Learning · Computer Science 2022-11-24 Qinghua Liu , Praneeth Netrapalli , Csaba Szepesvári , Chi Jin

Reinforcement learning (RL) has emerged as a powerful method for improving the reasoning abilities of large language models (LLMs). Outcome-based RL, which rewards policies solely for the correctness of the final answer, yields substantial…

Machine Learning · Computer Science 2025-09-09 Yuda Song , Julia Kempe , Remi Munos

Learning complex robot behaviors through interaction requires structured exploration. Planning should target interactions with the potential to optimize long-term performance, while only reducing uncertainty where conducive to this…

Machine Learning · Computer Science 2021-12-14 Tim Seyde , Wilko Schwarting , Sertac Karaman , Daniela Rus

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast…

Machine Learning · Computer Science 2021-04-15 Moritz A. Zanger , Karam Daaboul , J. Marius Zöllner

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic"…

Machine Learning · Computer Science 2020-07-16 Neale Ratzlaff , Qinxun Bai , Li Fuxin , Wei Xu

Behavior Foundation Models (BFMs) are capable of retrieving high-performing policy for any reward function specified directly at test-time, commonly referred to as zero-shot reinforcement learning (RL). While this is a very efficient…

Machine Learning · Computer Science 2026-03-03 Thomas Rupf , Marco Bagatella , Marin Vlastelica , Andreas Krause

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents…

Machine Learning · Computer Science 2023-11-01 Lenart Treven , Jonas Hübotter , Bhavya Sukhija , Florian Dörfler , Andreas Krause
‹ Prev 1 2 3 10 Next ›