Related papers: Online Sparse Reinforcement Learning

Online Reinforcement Learning in Markov Decision Process Using Linear Programming

We consider online reinforcement learning in episodic Markov decision process (MDP) with unknown transition function and stochastic rewards drawn from some fixed but unknown distribution. The learner aims to learn the optimal policy and…

Machine Learning · Computer Science 2024-03-12 Vincent Leon , S. Rasoul Etesami

Online learning in MDPs with linear function approximation and bandit feedback

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets…

Machine Learning · Computer Science 2021-06-15 Gergely Neu , Julia Olkhovskaya

Model-Based Reinforcement Learning with Double Oracle Efficiency in Policy Optimization and Offline Estimation

Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While…

Machine Learning · Computer Science 2026-05-04 Haichen Hu , Jian Qian , David Simchi-Levi

Sublinear Regret for Learning POMDPs

We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over…

Machine Learning · Computer Science 2022-07-19 Yi Xiong , Ningyuan Chen , Xuefeng Gao , Xiang Zhou

Model-based Reinforcement Learning and the Eluder Dimension

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Fast rates for online learning in Linearly Solvable Markov Decision Processes

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state…

Machine Learning · Computer Science 2017-06-07 Gergely Neu , Vicenç Gómez

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

We consider online learning for episodic stochastically constrained Markov decision processes (CMDPs), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the…

Machine Learning · Computer Science 2021-10-19 Shuang Qiu , Xiaohan Wei , Zhuoran Yang , Jieping Ye , Zhaoran Wang

Fast Rates for the Regret of Offline Reinforcement Learning

We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration…

Machine Learning · Computer Science 2023-07-13 Yichun Hu , Nathan Kallus , Masatoshi Uehara

Dynamic Regret of Online Markov Decision Processes

We investigate online Markov Decision Processes (MDPs) with adversarially changing loss functions and known transitions. We choose dynamic regret as the performance measure, defined as the performance difference between the learner and any…

Machine Learning · Computer Science 2022-08-29 Peng Zhao , Long-Fei Li , Zhi-Hua Zhou

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$. However, it…

Machine Learning · Computer Science 2023-05-16 Kaixuan Ji , Qingyue Zhao , Jiafan He , Weitong Zhang , Quanquan Gu

Online Learning in Weakly Coupled Markov Decision Processes: A Convergence Time Study

We consider multiple parallel Markov decision processes (MDPs) coupled by global constraints, where the time varying objective and constraint functions can only be observed after the decision is made. Special attention is given to how well…

Optimization and Control · Mathematics 2017-09-12 Xiaohan Wei , Hao Yu , Michael J. Neely

Online Learning in Kernelized Markov Decision Processes

We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ nonparametric Gaussian…

Machine Learning · Computer Science 2019-01-04 Sayak Ray Chowdhury , Aditya Gopalan

Learning Adversarial MDPs with Stochastic Hard Constraints

We study online learning in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints, under bandit feedback. We consider three scenarios. In the first one, we address general CMDPs, where we…

Machine Learning · Computer Science 2025-02-10 Francesco Emanuele Stradi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

Online Convex Optimization in Adversarial Markov Decision Processes

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show…

Machine Learning · Computer Science 2019-05-21 Aviv Rosenberg , Yishay Mansour

Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{\pi}$ Realizability for Deterministic Dynamics

We study computationally and statistically efficient reinforcement learning under the linear $Q^{\pi}$ realizability assumption, where any policy's $Q$-function is linear in a given state-action feature representation. Prior methods in this…

Machine Learning · Computer Science 2026-03-03 Yijing Ke , Zihan Zhang , Ruosong Wang

A Doubly Robust Approach to Sparse Reinforcement Learning

We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features. The only previously known algorithm for SMDP…

Machine Learning · Statistics 2023-10-25 Wonyoung Kim , Garud Iyengar , Assaf Zeevi

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory…

Machine Learning · Computer Science 2023-01-31 Uri Sherman , Tomer Koren , Yishay Mansour

Offline-Online Reinforcement Learning for Linear Mixture MDPs

We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may come from a mismatched environment,…

Machine Learning · Computer Science 2026-04-15 Zhongjun Zhang , Sean R. Sinclair

Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs

Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether…

Machine Learning · Statistics 2024-08-09 Kevin Tan , Wei Fan , Yuting Wei