Related papers: Optimistic PAC Reinforcement Learning: the Instanc…

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $\epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an…

Machine Learning · Computer Science 2022-06-23 Andrew Wagenmaker , Max Simchowitz , Kevin Jamieson

Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs

In probably approximately correct (PAC) reinforcement learning (RL), an agent is required to identify an $\epsilon$-optimal policy with probability $1-\delta$. While minimax optimal algorithms exist for this problem, its instance-dependent…

Machine Learning · Computer Science 2022-10-25 Andrea Tirinzoni , Aymen Al-Marjani , Emilie Kaufmann

Towards Instance-Optimality in Online PAC Reinforcement Learning

Several recent works have proposed instance-dependent upper bounds on the number of episodes needed to identify, with probability $1-\delta$, an $\varepsilon$-optimal policy in finite-horizon tabular Markov Decision Processes (MDPs). These…

Machine Learning · Statistics 2023-11-13 Aymen Al-Marjani , Andrea Tirinzoni , Emilie Kaufmann

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms…

Machine Learning · Computer Science 2018-01-03 Christoph Dann , Tor Lattimore , Emma Brunskill

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true…

Machine Learning · Computer Science 2023-07-21 Andrew Wagenmaker , Kevin Jamieson

Instance-optimal PAC Algorithms for Contextual Bandits

In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic…

Machine Learning · Statistics 2023-10-04 Zhaoqi Li , Lillian Ratliff , Houssam Nassif , Kevin Jamieson , Lalit Jain

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired…

Machine Learning · Statistics 2022-01-24 Koulik Khamaru , Eric Xia , Martin J. Wainwright , Michael I. Jordan

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves…

Machine Learning · Computer Science 2019-12-12 Aristide Tossou , Debabrota Basu , Christos Dimitrakakis

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the…

Machine Learning · Computer Science 2021-10-27 Christoph Dann , Teodor V. Marinov , Mehryar Mohri , Julian Zimmert

Optimistic Reinforcement Learning with Quantile Objectives

Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which is critical in various fields, including…

Machine Learning · Computer Science 2025-11-14 Mohammad Alipour-Vaezi , Huaiyang Zhong , Kwok-Leung Tsui , Sajad Khodadadian

PAC Bounds for Discounted MDPs

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). For the upper bound we make the assumption that each action leads to at most two…

Machine Learning · Computer Science 2013-05-17 Tor Lattimore , Marcus Hutter

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt…

Machine Learning · Computer Science 2023-06-13 Kefan Dong , Tengyu Ma

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While…

Machine Learning · Computer Science 2020-10-08 Dylan J. Foster , Alexander Rakhlin , David Simchi-Levi , Yunzong Xu

Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing…

Machine Learning · Statistics 2026-02-25 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Tail Distribution of Regret in Optimistic Reinforcement Learning

We derive instance-dependent tail bounds for the regret of optimism-based reinforcement learning in finite-horizon tabular Markov decision processes with unknown transition dynamics. We first study a UCBVI-type (model-based) algorithm and…

Machine Learning · Computer Science 2026-03-18 Sajad Khodadadian , Mehrdad Moharrami

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known,…

Machine Learning · Computer Science 2023-01-24 Orin Levy , Yishay Mansour

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty.…

Machine Learning · Computer Science 2020-12-02 Sebastian Curi , Felix Berkenkamp , Andreas Krause

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the…

Machine Learning · Computer Science 2020-10-09 Omar Darwiche Domingues , Pierre Ménard , Emilie Kaufmann , Michal Valko

Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation

Continuous-time reinforcement learning (CTRL) provides a natural framework for sequential decision-making in dynamic environments where interactions evolve continuously over time. While CTRL has shown growing empirical success, its ability…

Machine Learning · Computer Science 2025-12-04 Runze Zhao , Yue Yu , Ruhan Wang , Chunfeng Huang , Dongruo Zhou