Related papers: Provably Efficient Reinforcement Learning with Lin…

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic…

Machine Learning · Computer Science 2023-01-31 Pihe Hu , Yu Chen , Longbo Huang

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $\sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP.…

Machine Learning · Computer Science 2021-02-19 Jiafan He , Dongruo Zhou , Quanquan Gu

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an…

Machine Learning · Computer Science 2022-06-02 Sanae Amani , Lin F. Yang , Ching-An Cheng

Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing…

Machine Learning · Statistics 2026-02-25 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning

In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The multi-batch reinforcement learning framework, where the agent…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Yuhang Jiang , Yuan Zhou , Xiangyang Ji

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given…

Machine Learning · Computer Science 2023-11-07 Jiafan He , Heyang Zhao , Dongruo Zhou , Quanquan Gu

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment. Specifically, both the reward and state transition functions can evolve over time but their…

Machine Learning · Computer Science 2024-04-16 Huozhi Zhou , Jinglin Chen , Lav R. Varshney , Ashish Jagmohan

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB)…

Machine Learning · Computer Science 2024-11-19 Philips George John , Arnab Bhattacharyya , Silviu Maniu , Dimitrios Myrisiotis , Zhenan Wu

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and…

Machine Learning · Computer Science 2020-11-20 Jonathan N. Lee , Aldo Pacchiano , Vidya Muthukumar , Weihao Kong , Emma Brunskill

Achieving Constant Regret in Linear Markov Decision Processes

We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for…

Machine Learning · Computer Science 2024-12-13 Weitong Zhang , Zhiyuan Fan , Jiafan He , Quanquan Gu

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

A General Framework for Sequential Decision-Making under Adaptivity Constraints

We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning. First, we provide a general class called the Eluder Condition class, which includes a wide range…

Machine Learning · Computer Science 2023-12-07 Nuoya Xiong , Zhaoran Wang , Zhuoran Yang

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model (Jia et al., 2020; Ayoub et al., 2020; Zhou et al.,…

Machine Learning · Computer Science 2021-01-08 Dongruo Zhou , Quanquan Gu , Csaba Szepesvari

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space…

Machine Learning · Computer Science 2024-03-08 Jiayi Huang , Han Zhong , Liwei Wang , Lin F. Yang

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However,…

Machine Learning · Computer Science 2023-02-10 Ming Shi , Yingbin Liang , Ness Shroff

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Learning Markov decision processes (MDPs) in the presence of the adversary is a challenging problem in reinforcement learning (RL). In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the…

Machine Learning · Computer Science 2022-04-21 Jiafan He , Dongruo Zhou , Quanquan Gu

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with…

Machine Learning · Computer Science 2025-10-06 Heyang Zhao , Jiafan He , Quanquan Gu

Provably Efficient Reinforcement Learning with Linear Function Approximation

Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy. The introduction of…

Machine Learning · Computer Science 2019-08-09 Chi Jin , Zhuoran Yang , Zhaoran Wang , Michael I. Jordan

Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret

We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility. We propose two provably…

Machine Learning · Computer Science 2020-06-25 Yingjie Fei , Zhuoran Yang , Yudong Chen , Zhaoran Wang , Qiaomin Xie