Related papers: Optimistic Reinforcement Learning with Quantile Ob…

Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning

We study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Markov Decision Processes (MDPs). In the finite-horizon case, the transition functions may…

Machine Learning · Computer Science 2026-03-31 Ha Manh Bui , Felix Parker , Kimia Ghobadi , Anqi Liu

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the…

Quantum Physics · Physics 2024-06-14 Han Zhong , Jiachen Hu , Yecheng Xue , Tongyang Li , Liwei Wang

Quantum Computing Provides Exponential Regret Improvement in Episodic Reinforcement Learning

In this paper, we investigate the problem of \textit{episodic reinforcement learning} with quantum oracles for state evolution. To this end, we propose an \textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to…

Machine Learning · Computer Science 2023-02-20 Bhargav Ganguly , Yulian Wu , Di Wang , Vaneet Aggarwal

Optimism in Reinforcement Learning and Kullback-Leibler Divergence

We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint…

Machine Learning · Computer Science 2011-09-22 Sarah Filippi , Olivier Cappé , Aurélien Garivier

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an…

Machine Learning · Computer Science 2022-06-02 Sanae Amani , Lin F. Yang , Ching-An Cheng

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is…

Machine Learning · Computer Science 2023-05-26 Kaiwen Wang , Nathan Kallus , Wen Sun

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with…

Machine Learning · Computer Science 2025-10-06 Heyang Zhao , Jiafan He , Quanquan Gu

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves…

Machine Learning · Computer Science 2019-12-12 Aristide Tossou , Debabrota Basu , Christos Dimitrakakis

Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

Transition Transfer $Q$-Learning for Composite Markov Decision Processes

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where…

Machine Learning · Statistics 2025-02-04 Jinhang Chai , Elynn Chen , Lin Yang

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective…

Machine Learning · Computer Science 2020-06-26 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the…

Machine Learning · Computer Science 2022-03-25 Omar Darwiche Domingues , Pierre Ménard , Matteo Pirotta , Emilie Kaufmann , Michal Valko

Towards Tractable Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

Regret Bounds for Risk-Sensitive Reinforcement Learning

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret…

Machine Learning · Computer Science 2022-10-12 O. Bastani , Y. J. Ma , E. Shen , W. Xu

Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work…

Machine Learning · Computer Science 2024-07-11 Dake Zhang , Boxiang Lyu , Shuang Qiu , Mladen Kolar , Tong Zhang

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to…

Machine Learning · Computer Science 2024-12-24 Zhengqi Wu , Renyuan Xu

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

Provably Efficient CVaR RL in Low-rank MDPs

We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision…

Machine Learning · Computer Science 2023-11-21 Yulai Zhao , Wenhao Zhan , Xiaoyan Hu , Ho-fung Leung , Farzan Farnia , Wen Sun , Jason D. Lee

Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts, ie, both the reward and state transition distributions are allowed to evolve over time, as long as their respective total…

Machine Learning · Computer Science 2020-05-19 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Yuhang Jiang , Yuan Zhou , Xiangyang Ji