English
Related papers

Related papers: Optimistic Reinforcement Learning with Quantile Ob…

200 papers

We study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Markov Decision Processes (MDPs). In the finite-horizon case, the transition functions may…

Machine Learning · Computer Science 2026-03-31 Ha Manh Bui , Felix Parker , Kimia Ghobadi , Anqi Liu

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the…

Quantum Physics · Physics 2024-06-14 Han Zhong , Jiachen Hu , Yecheng Xue , Tongyang Li , Liwei Wang

In this paper, we investigate the problem of \textit{episodic reinforcement learning} with quantum oracles for state evolution. To this end, we propose an \textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to…

Machine Learning · Computer Science 2023-02-20 Bhargav Ganguly , Yulian Wu , Di Wang , Vaneet Aggarwal

We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint…

Machine Learning · Computer Science 2011-09-22 Sarah Filippi , Olivier Cappé , Aurélien Garivier

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an…

Machine Learning · Computer Science 2022-06-02 Sanae Amani , Lin F. Yang , Ching-An Cheng

In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$. Starting with multi-arm bandits (MABs), we show the minimax CVaR regret rate is…

Machine Learning · Computer Science 2023-05-26 Kaiwen Wang , Nathan Kallus , Wen Sun

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with…

Machine Learning · Computer Science 2025-10-06 Heyang Zhao , Jiafan He , Quanquan Gu

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves…

Machine Learning · Computer Science 2019-12-12 Aristide Tossou , Debabrota Basu , Christos Dimitrakakis

We study the problem of reinforcement learning in infinite-horizon discounted linear Markov decision processes (MDPs), and propose the first computationally efficient algorithm achieving rate-optimal regret guarantees in this setting. Our…

Machine Learning · Computer Science 2026-03-16 Antoine Moulin , Gergely Neu , Luca Viano

To bridge the gap between empirical success and theoretical understanding in transfer reinforcement learning (RL), we study a principled approach with provable performance guarantees. We introduce a novel composite MDP framework where…

Machine Learning · Statistics 2025-02-04 Jinhang Chai , Elynn Chen , Lin Yang

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective…

Machine Learning · Computer Science 2020-06-26 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the…

Machine Learning · Computer Science 2022-03-25 Omar Darwiche Domingues , Pierre Ménard , Matteo Pirotta , Emilie Kaufmann , Michal Valko

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate…

Machine Learning · Computer Science 2021-12-07 Aldo Pacchiano , Philip J. Ball , Jack Parker-Holder , Krzysztof Choromanski , Stephen Roberts

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret…

Machine Learning · Computer Science 2022-10-12 O. Bastani , Y. J. Ma , E. Shen , W. Xu

We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work…

Machine Learning · Computer Science 2024-07-11 Dake Zhang , Boxiang Lyu , Shuang Qiu , Mladen Kolar , Tong Zhang

Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to…

Machine Learning · Computer Science 2024-12-24 Zhengqi Wu , Renyuan Xu

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision…

Machine Learning · Computer Science 2023-11-21 Yulai Zhao , Wenhao Zhan , Xiaoyan Hu , Ho-fung Leung , Farzan Farnia , Wen Sun , Jason D. Lee

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts, ie, both the reward and state transition distributions are allowed to evolve over time, as long as their respective total…

Machine Learning · Computer Science 2020-05-19 Wang Chi Cheung , David Simchi-Levi , Ruihao Zhu

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Yuhang Jiang , Yuan Zhou , Xiangyang Ji
‹ Prev 1 2 3 10 Next ›