Related papers: Optimistic Simulated Exploration as an Incentive f…

On Optimistic versus Randomized Exploration in Reinforcement Learning

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and…

Machine Learning · Statistics 2017-06-15 Ian Osband , Benjamin Van Roy

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic…

Machine Learning · Computer Science 2025-03-12 Jasmine Bayrooti , Carl Henrik Ek , Amanda Prorok

Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under…

Machine Learning · Computer Science 2021-08-02 Robert Loftin , Aadirupa Saha , Sam Devlin , Katja Hofmann

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that…

Machine Learning · Computer Science 2026-02-11 Akshay Mete , Shahid Aamir Sheikh , Tzu-Hsiang Lin , Dileep Kalathil , P. R. Kumar

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty.…

Machine Learning · Computer Science 2020-12-02 Sebastian Curi , Felix Berkenkamp , Andreas Krause

The many faces of optimism - Extended version

The exploration-exploitation dilemma has been an intriguing and unsolved problem within the framework of reinforcement learning. "Optimism in the face of uncertainty" and model building play central roles in advanced exploration methods.…

Artificial Intelligence · Computer Science 2008-10-21 István Szita , András Lőrincz

Learning latent state representation for speeding up exploration

Exploration is an extremely challenging problem in reinforcement learning, especially in high dimensional state and action spaces and when only sparse rewards are available. Effective representations can indicate which components of the…

Machine Learning · Computer Science 2019-05-31 Giulia Vezzani , Abhishek Gupta , Lorenzo Natale , Pieter Abbeel

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty…

Machine Learning · Computer Science 2023-03-06 Jihwan Oh , Joonkee Kim , Minchan Jeong , Se-Young Yun

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore…

Machine Learning · Computer Science 2026-04-21 Faisal Mohamed , Catherine Ji , Benjamin Eysenbach , Glen Berseth

A Survey of Exploration Methods in Reinforcement Learning

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to…

Machine Learning · Computer Science 2021-09-03 Susan Amin , Maziar Gomrokchi , Harsh Satija , Herke van Hoof , Doina Precup

Safety Representations for Safer Policy Learning

Reinforcement learning algorithms typically necessitate extensive exploration of the state space to find optimal policies. However, in safety-critical applications, the risks associated with such exploration can lead to catastrophic…

Machine Learning · Computer Science 2025-02-28 Kaustubh Mani , Vincent Mai , Charlie Gauthier , Annie Chen , Samer Nashed , Liam Paull

Fast active learning for pure exploration in reinforcement learning

Realistic environments often provide agents with very limited feedback. When the environment is initially unknown, the feedback, in the beginning, can be completely absent, and the agents may first choose to devote all their effort on…

Machine Learning · Computer Science 2020-10-13 Pierre Ménard , Omar Darwiche Domingues , Anders Jonsson , Emilie Kaufmann , Edouard Leurent , Michal Valko

Directed Exploration for Reinforcement Learning

Efficient exploration is necessary to achieve good sample efficiency for reinforcement learning in general. From small, tabular settings such as gridworlds to large, continuous and sparse reward settings such as robotic object manipulation…

Machine Learning · Computer Science 2019-06-20 Zhaohan Daniel Guo , Emma Brunskill

EMI: Exploration with Mutual Information

Reinforcement learning algorithms struggle when the reward signal is very sparse. In these cases, naive random exploration methods essentially rely on a random walk to stumble onto a rewarding state. Recent works utilize intrinsic…

Machine Learning · Computer Science 2019-06-14 Hyoungseok Kim , Jaekyeom Kim , Yeonwoo Jeong , Sergey Levine , Hyun Oh Song

Meta-Learning to Explore via Memory Density Feedback

Exploration algorithms for reinforcement learning typically replace or augment the reward function with an additional ``intrinsic'' reward that trains the agent to seek previously unseen states of the environment. Here, we consider an…

Machine Learning · Computer Science 2025-09-30 Kevin McKee , Eric Alt , Andrew Grebenisan , Mick van Gelderen , Gary Miguel

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the real world, with the hope that it…

Machine Learning · Computer Science 2024-10-29 Andrew Wagenmaker , Kevin Huang , Liyiming Ke , Byron Boots , Kevin Jamieson , Abhishek Gupta

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or…

Machine Learning · Computer Science 2022-12-01 Alain Andres , Esther Villar-Rodriguez , Javier Del Ser

Exploration and Incentives in Reinforcement Learning

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional…

Machine Learning · Computer Science 2023-02-21 Max Simchowitz , Aleksandrs Slivkins

Explicit Explore-Exploit Algorithms in Continuous State Spaces

We present a new model-based algorithm for reinforcement learning (RL) which consists of explicit exploration and exploitation phases, and is applicable in large or infinite state spaces. The algorithm maintains a set of dynamics models…

Machine Learning · Computer Science 2019-12-03 Mikael Henaff