Related papers: Exploration-Enhanced POLITEX

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm (a version of regularized policy…

Machine Learning · Computer Science 2021-02-26 Nevena Lazic , Dong Yin , Yasin Abbasi-Yadkori , Csaba Szepesvari

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited,…

Machine Learning · Computer Science 2021-02-12 Botao Hao , Nevena Lazic , Yasin Abbasi-Yadkori , Pooria Joulani , Csaba Szepesvari

Conservative Exploration in Reinforcement Learning

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward. Although the agent will…

Machine Learning · Computer Science 2020-07-16 Evrard Garcelon , Mohammad Ghavamzadeh , Alessandro Lazaric , Matteo Pirotta

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs

We study the problem of infinite-horizon average-reward reinforcement learning with linear Markov decision processes (MDPs). The associated Bellman operator of the problem not being a contraction makes the algorithm design challenging.…

Machine Learning · Statistics 2025-03-12 Kihyuk Hong , Woojin Chae , Yufan Zhang , Dabeen Lee , Ambuj Tewari

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an…

Machine Learning · Computer Science 2022-06-23 Andrew Wagenmaker , Max Simchowitz , Kevin Jamieson

Provably Efficient Exploration in Reward Machines with Low Regret

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with…

Machine Learning · Computer Science 2024-12-30 Hippolyte Bourel , Anders Jonsson , Odalric-Ambrym Maillard , Chenxiao Ma , Mohammad Sadegh Talebi

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

We study the challenging exploration incentive problem in both bandit and reinforcement learning, where the rewards are scale-free and potentially unbounded, driven by real-world scenarios and differing from existing work. Past works in…

Machine Learning · Computer Science 2024-05-07 Mengfan Xu , Diego Klabjan

Satisficing Exploration for Deep Reinforcement Learning

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world,…

Machine Learning · Computer Science 2024-07-23 Dilip Arumugam , Saurabh Kumar , Ramki Gummadi , Benjamin Van Roy

Reward-Free Exploration for Reinforcement Learning

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new…

Machine Learning · Computer Science 2020-02-10 Chi Jin , Akshay Krishnamurthy , Max Simchowitz , Tiancheng Yu

TSEB: More Efficient Thompson Sampling for Policy Learning

In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the…

Machine Learning · Computer Science 2015-10-13 P. Prasanna , Sarath Chandar , Balaraman Ravindran

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.…

Machine Learning · Computer Science 2020-08-14 Alekh Agarwal , Mikael Henaff , Sham Kakade , Wen Sun

Exploration and Incentives in Reinforcement Learning

How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional…

Machine Learning · Computer Science 2023-02-21 Max Simchowitz , Aleksandrs Slivkins

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

This work theoretically studies a ubiquitous reinforcement learning policy for controlling the canonical model of continuous-time stochastic linear-quadratic systems. We show that randomized certainty equivalent policy addresses the…

Machine Learning · Computer Science 2022-08-23 Mohamad Kazem Shirani Faradonbeh

Sublinear Regret for Learning POMDPs

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over…

Machine Learning · Computer Science 2022-07-19 Yi Xiong , Ningyuan Chen , Xuefeng Gao , Xiang Zhou

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass…

Machine Learning · Computer Science 2021-07-20 Haipeng Luo , Chen-Yu Wei , Chung-Wei Lee

Exploration in Structured Reinforcement Learning

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For…

Machine Learning · Computer Science 2018-11-30 Jungseul Ok , Alexandre Proutiere , Damianos Tranos

Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models

We tackle average-reward infinite-horizon POMDPs with an unknown transition model but a known observation model, a setting that has been previously addressed in two limiting ways: (i) frequentist methods relying on suboptimal stochastic…

Machine Learning · Computer Science 2025-09-09 Alessio Russo , Alberto Maria Metelli , Marcello Restelli

Regret-based Reward Elicitation for Markov Decision Processes

The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work…

Artificial Intelligence · Computer Science 2012-05-14 Kevin Regan , Craig Boutilier