Related papers: Optimistic Exploration even with a Pessimistic Ini…

Domain-Independent Optimistic Initialization for Reinforcement Learning

In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the…

Machine Learning · Computer Science 2014-10-20 Marlos C. Machado , Sriram Srinivasan , Michael Bowling

On Optimistic versus Randomized Exploration in Reinforcement Learning

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and…

Machine Learning · Statistics 2017-06-15 Ian Osband , Benjamin Van Roy

Minimax Optimal Reinforcement Learning with Quasi-Optimism

In our quest for a reinforcement learning (RL) algorithm that is both practical and provably optimal, we introduce EQO (Exploration via Quasi-Optimism). Unlike existing minimax optimal approaches, EQO avoids reliance on empirical variances…

Machine Learning · Computer Science 2025-07-29 Harin Lee , Min-hwan Oh

Efficient Reinforcement Learning via Decoupling Exploration and Utilization

Reinforcement Learning (RL), recognized as an efficient learning approach, has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles. Classical single-agent reinforcement…

Machine Learning · Computer Science 2024-05-13 Jingpu Yang , Helin Wang , Qirui Zhao , Zhecheng Shi , Zirui Song , Miao Fang

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle…

Machine Learning · Computer Science 2022-06-14 Laixi Shi , Gen Li , Yuting Wei , Yuxin Chen , Yuejie Chi

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning

Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that…

Machine Learning · Computer Science 2026-02-11 Akshay Mete , Shahid Aamir Sheikh , Tzu-Hsiang Lin , Dileep Kalathil , P. R. Kumar

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty.…

Machine Learning · Computer Science 2020-12-02 Sebastian Curi , Felix Berkenkamp , Andreas Krause

Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization

Exploration remains a key challenge in deep reinforcement learning (RL). Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep…

Machine Learning · Computer Science 2023-06-06 Brendan O'Donoghue

Tactical Optimism and Pessimism for Deep Reinforcement Learning

In recent years, deep off-policy actor-critic algorithms have become a dominant approach to reinforcement learning for continuous control. One of the primary drivers of this improved performance is the use of pessimistic value updates to…

Machine Learning · Computer Science 2022-04-07 Ted Moskovitz , Jack Parker-Holder , Aldo Pacchiano , Michael Arbel , Michael I. Jordan

Towards Tractable Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

Principled Exploration via Optimistic Bootstrapping and Backward Induction

One principled approach for provably efficient exploration is incorporating the upper confidence bound (UCB) into the value function as a bonus. However, UCB is specified to deal with linear and tabular settings and is incompatible with…

Machine Learning · Computer Science 2021-05-18 Chenjia Bai , Lingxiao Wang , Lei Han , Jianye Hao , Animesh Garg , Peng Liu , Zhaoran Wang

Semi-pessimistic Reinforcement Learning

Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected data. However, it faces challenges of distributional shift, where the learned policy may encounter unseen scenarios not covered in the offline data.…

Machine Learning · Computer Science 2025-05-27 Jin Zhu , Xin Zhou , Jiaang Yao , Gholamali Aminian , Omar Rivasplata , Simon Little , Lexin Li , Chengchun Shi

Optimistic Simulated Exploration as an Incentive for Real Exploration

Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an…

Machine Learning · Computer Science 2009-05-20 Ivo Danihelka

Pessimistic Auxiliary Policy for Offline Reinforcement Learning

Offline reinforcement learning aims to learn an agent from pre-collected datasets, avoiding unsafe and inefficient real-time interaction. However, inevitable access to out-ofdistribution actions during the learning process introduces…

Artificial Intelligence · Computer Science 2026-03-06 Fan Zhang , Baoru Huang , Xin Zhang

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to…

Multiagent Systems · Computer Science 2026-05-05 Ruoning Zhang , Siying Wang , Wenyu Chen , Yang Zhou , Zhitong Zhao , Zixuan Zhang , Ruijie Zhang , Stefano V. Albrecht

Optimistic Online LQR via Intrinsic Rewards

Optimism in the face of uncertainty is a popular approach to balance exploration and exploitation in reinforcement learning. Here, we consider the online linear quadratic regulator (LQR) problem, i.e., to learn the LQR corresponding to an…

Systems and Control · Electrical Eng. & Systems 2026-04-01 Marcell Bartos , Bruce D. Lee , Lenart Treven , Andreas Krause , Florian Dörfler , Melanie N. Zeilinger

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration…

Machine Learning · Computer Science 2023-10-13 Max Sobol Mark , Archit Sharma , Fahim Tajwar , Rafael Rafailov , Sergey Levine , Chelsea Finn

Optimistic Active Exploration of Dynamical Systems

Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve…

Machine Learning · Computer Science 2023-10-31 Bhavya Sukhija , Lenart Treven , Cansu Sancaktar , Sebastian Blaes , Stelian Coros , Andreas Krause

Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping

In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the…

Machine Learning · Computer Science 2022-10-18 Hao Sun , Lei Han , Rui Yang , Xiaoteng Ma , Jian Guo , Bolei Zhou