Related papers: Provably Efficient Model-Free Constrained RL with …

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient…

Machine Learning · Computer Science 2022-01-04 Tianhao Wang , Dongruo Zhou , Quanquan Gu

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time but their…

Machine Learning · Computer Science 2024-04-16 Huozhi Zhou , Jinglin Chen , Lav R. Varshney , Ashish Jagmohan

Achieving Constant Regret in Linear Markov Decision Processes

We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for…

Machine Learning · Computer Science 2024-12-13 Weitong Zhang , Zhiyuan Fan , Jiafan He , Quanquan Gu

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $\sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP.…

Machine Learning · Computer Science 2021-02-19 Jiafan He , Dongruo Zhou , Quanquan Gu

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic…

Machine Learning · Computer Science 2023-01-31 Pihe Hu , Yu Chen , Longbo Huang

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an…

Machine Learning · Computer Science 2022-06-02 Sanae Amani , Lin F. Yang , Ching-An Cheng

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total…

Machine Learning · Computer Science 2026-01-29 Toshinori Kitamura , Arnob Ghosh , Tadashi Kozuno , Wataru Kumagai , Kazumi Kasaura , Kenta Hoshino , Yohei Hosoe , Yutaka Matsuo

Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the…

Machine Learning · Computer Science 2023-03-13 Honghao Wei , Arnob Ghosh , Ness Shroff , Lei Ying , Xingyu Zhou

Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing…

Machine Learning · Statistics 2026-02-25 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…

Machine Learning · Computer Science 2019-08-09 Chi Jin , Zhuoran Yang , Zhaoran Wang , Michael I. Jordan

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do…

Machine Learning · Computer Science 2022-08-23 Weichao Mao , Kaiqing Zhang , Ruihao Zhu , David Simchi-Levi , Tamer Başar

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and…

Machine Learning · Computer Science 2020-11-20 Jonathan N. Lee , Aldo Pacchiano , Vidya Muthukumar , Weihao Kong , Emma Brunskill

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB)…

Machine Learning · Computer Science 2024-11-19 Philips George John , Arnab Bhattacharyya , Silviu Maniu , Dimitrios Myrisiotis , Zhenan Wu

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches,…

Machine Learning · Computer Science 2021-10-27 Haque Ishfaq , Qiwen Cui , Viet Nguyen , Alex Ayoub , Zhuoran Yang , Zhaoran Wang , Doina Precup , Lin F. Yang

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

We consider a multi-agent episodic MDP setup where an agent (leader) takes action at each step of the episode followed by another agent (follower). The state evolution and rewards depend on the joint action pair of the leader and the…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh

Model-Free Linear Quadratic Control via Reduction to Expert Prediction

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to…

Machine Learning · Computer Science 2018-10-09 Yasin Abbasi-Yadkori , Nevena Lazic , Csaba Szepesvari

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to…

Machine Learning · Computer Science 2021-06-14 Sanae Amani , Christos Thrampoulidis , Lin F. Yang

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space…

Machine Learning · Computer Science 2024-03-08 Jiayi Huang , Han Zhong , Liwei Wang , Lin F. Yang

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend…

Machine Learning · Computer Science 2025-07-25 Yilie Huang , Yanwei Jia , Xun Yu Zhou

Learning in Markov Decision Processes under Constraints

We consider reinforcement learning (RL) in Markov Decision Processes in which an agent repeatedly interacts with an environment that is modeled by a controlled Markov process. At each time step $t$, it earns a reward, and also incurs a…

Machine Learning · Computer Science 2023-03-16 Rahul Singh , Abhishek Gupta , Ness B. Shroff