Related papers: Online Model Selection for Reinforcement Learning …

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural…

Machine Learning · Computer Science 2021-01-01 Zhuoran Yang , Chi Jin , Zhaoran Wang , Mengdi Wang , Michael I. Jordan

Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation…

Machine Learning · Computer Science 2016-08-18 K J Prabuchandran , Tejas Bodas , Theja Tulabandhula

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case…

Machine Learning · Statistics 2026-05-28 Wonyoung Kim , Min-Hwan Oh , Garud Iyengar , Assaf Zeevi

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $\sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP.…

Machine Learning · Computer Science 2021-02-19 Jiafan He , Dongruo Zhou , Quanquan Gu

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However,…

Machine Learning · Computer Science 2023-02-10 Ming Shi , Yingbin Liang , Ness Shroff

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By…

Machine Learning · Computer Science 2020-01-01 Zihan Zhang , Xiangyang Ji

Regret Bounds for Reinforcement Learning with Policy Advice

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which…

Machine Learning · Statistics 2013-07-19 Mohammad Gheshlaghi Azar , Alessandro Lazaric , Emma Brunskill

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know…

Machine Learning · Statistics 2022-07-08 Avishek Ghosh , Sayak Ray Chowdhury

Test-Time Regret Minimization in Meta Reinforcement Learning

Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently. In this paper, we consider a finite set of tasks modeled…

Machine Learning · Computer Science 2024-06-05 Mirco Mutti , Aviv Tamar

On the Model-Misspecification in Reinforcement Learning

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function…

Machine Learning · Computer Science 2024-01-09 Yunfan Li , Lin Yang

A Reduction from Reinforcement Learning to No-Regret Online Learning

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance…

Machine Learning · Computer Science 2020-01-03 Ching-An Cheng , Remi Tachet des Combes , Byron Boots , Geoff Gordon

Settling the Sample Complexity of Online Reinforcement Learning

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a…

Machine Learning · Computer Science 2025-04-30 Zihan Zhang , Yuxin Chen , Jason D. Lee , Simon S. Du

Refined Regret for Adversarial MDPs with Linear Function Approximation

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in…

Machine Learning · Computer Science 2023-06-05 Yan Dai , Haipeng Luo , Chen-Yu Wei , Julian Zimmert

Model-based Reinforcement Learning and the Eluder Dimension

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given…

Machine Learning · Computer Science 2023-11-07 Jiafan He , Heyang Zhao , Dongruo Zhou , Quanquan Gu