English
Related papers

Related papers: Online Model Selection for Reinforcement Learning …

200 papers

The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions. Further progress hinges on combining RL with modern function approximators such as kernel functions and deep neural…

Machine Learning · Computer Science 2021-01-01 Zhuoran Yang , Chi Jin , Zhaoran Wang , Mengdi Wang , Michael I. Jordan

A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation…

Machine Learning · Computer Science 2016-08-18 K J Prabuchandran , Tejas Bodas , Theja Tulabandhula

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case…

Machine Learning · Statistics 2026-05-28 Wonyoung Kim , Min-Hwan Oh , Garud Iyengar , Assaf Zeevi

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $\sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP.…

Machine Learning · Computer Science 2021-02-19 Jiafan He , Dongruo Zhou , Quanquan Gu

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However,…

Machine Learning · Computer Science 2023-02-10 Ming Shi , Yingbin Liang , Ness Shroff

We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By…

Machine Learning · Computer Science 2020-01-01 Zihan Zhang , Xiangyang Ji

In some reinforcement learning problems an agent may be provided with a set of input policies, perhaps learned from prior experience or provided by advisors. We present a reinforcement learning with policy advice (RLPA) algorithm which…

Machine Learning · Statistics 2013-07-19 Mohammad Gheshlaghi Azar , Alessandro Lazaric , Emma Brunskill

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know…

Machine Learning · Statistics 2022-07-08 Avishek Ghosh , Sayak Ray Chowdhury

Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently. In this paper, we consider a finite set of tasks modeled…

Machine Learning · Computer Science 2024-06-05 Mirco Mutti , Aviv Tamar

The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function…

Machine Learning · Computer Science 2024-01-09 Yunfan Li , Lin Yang

We present a reduction from reinforcement learning (RL) to no-regret online learning based on the saddle-point formulation of RL, by which "any" online algorithm with sublinear regret can generate policies with provable performance…

Machine Learning · Computer Science 2020-01-03 Ching-An Cheng , Remi Tachet des Combes , Byron Boots , Geoff Gordon

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a…

Machine Learning · Computer Science 2025-04-30 Zihan Zhang , Yuxin Chen , Jason D. Lee , Simon S. Du

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in…

Machine Learning · Computer Science 2023-06-05 Yan Dai , Haipeng Luo , Chen-Yu Wei , Julian Zimmert

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given…

Machine Learning · Computer Science 2023-11-07 Jiafan He , Heyang Zhao , Dongruo Zhou , Quanquan Gu
‹ Prev 1 2 3 10 Next ›