Related papers: A General Framework for Sample-Efficient Function …

Non-stationary Reinforcement Learning under General Function Approximation

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function…

Machine Learning · Computer Science 2023-06-02 Songtao Feng , Ming Yin , Ruiquan Huang , Yu-Xiang Wang , Jing Yang , Yingbin Liang

MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning

Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional…

Machine Learning · Computer Science 2024-03-18 Zohar Rimon , Tom Jurgenson , Orr Krupnik , Gilad Adler , Aviv Tamar

A General Markov Decision Process Framework for Directly Learning Optimal Control Policies

We consider a new form of reinforcement learning (RL) that is based on opportunities to directly learn the optimal control policy and a general Markov decision process (MDP) framework devised to support these opportunities. Derivations of…

Machine Learning · Computer Science 2021-04-02 Yingdong Lu , Mark S. Squillante , Chai Wah Wu

Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning

We present the first finite-sample analysis of policy evaluation in robust average-reward Markov Decision Processes (MDPs). Prior work in this setting have established only asymptotic convergence guarantees, leaving open the question of…

Machine Learning · Statistics 2025-12-11 Yang Xu , Washim Uddin Mondal , Vaneet Aggarwal

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes Markov decision process (MDP), partially observable Markov decision process (POMDP), and predictive state…

Machine Learning · Computer Science 2023-07-03 Han Zhong , Wei Xiong , Sirui Zheng , Liwei Wang , Zhaoran Wang , Zhuoran Yang , Tong Zhang

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know…

Machine Learning · Statistics 2022-07-08 Avishek Ghosh , Sayak Ray Chowdhury

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy. We design new algorithms for RL with a generic…

Machine Learning · Computer Science 2019-05-31 Wen Sun , Nan Jiang , Akshay Krishnamurthy , Alekh Agarwal , John Langford

Active Reinforcement Learning over MDPs

The past decade has seen the rapid development of Reinforcement Learning, which acquires impressive performance with numerous training resources. However, one of the greatest challenges in RL is generalization efficiency (i.e.,…

Machine Learning · Computer Science 2021-08-18 Qi Yang , Peng Yang , Ke Tang

Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency

We study reinforcement learning for partially observed Markov decision processes (POMDPs) with infinite observation and state spaces, which remains less investigated theoretically. To this end, we make the first attempt at bridging partial…

Machine Learning · Computer Science 2024-04-02 Qi Cai , Zhuoran Yang , Zhaoran Wang

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. Specifically, we propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP), which…

Machine Learning · Computer Science 2024-04-22 Jianliang He , Han Zhong , Zhuoran Yang

Generalized Linear Markov Decision Process

The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics…

Machine Learning · Statistics 2025-06-03 Sinian Zhang , Kaicheng Zhang , Ziping Xu , Tianxi Cai , Doudou Zhou

Learning Near Optimal Policies with Low Inherent Bellman Error

We study the exploration problem with approximate linear action-value functions in episodic reinforcement learning under the notion of low inherent Bellman error, a condition normally employed to show convergence of approximate value…

Machine Learning · Computer Science 2020-06-30 Andrea Zanette , Alessandro Lazaric , Mykel Kochenderfer , Emma Brunskill

End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions

We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains…

Machine Learning · Computer Science 2026-03-25 Zakaria Mhammedi , Alexander Rakhlin , Nneka Okolo

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. Whether we optimize for regret, sample…

Machine Learning · Computer Science 2021-11-19 Jean Tarbouriech , Matteo Pirotta , Michal Valko , Alessandro Lazaric

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to…

Machine Learning · Computer Science 2024-12-24 Zhengqi Wu , Renyuan Xu

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes

We develop several provably efficient model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov Decision Processes (MDPs). We consider both online setting and the setting with access to a simulator. In the…

Machine Learning · Computer Science 2023-06-29 Zihan Zhang , Qiaomin Xie

Reinforcement Learning Measurement Model

Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018),…

Methodology · Statistics 2026-05-12 Wenqian Xu , Feng Ji

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions

Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting…

Machine Learning · Computer Science 2024-10-04 Zakaria Mhammedi

Reward-Free Exploration for Reinforcement Learning

Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new…

Machine Learning · Computer Science 2020-02-10 Chi Jin , Akshay Krishnamurthy , Max Simchowitz , Tiancheng Yu

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Most of the existing works for reinforcement learning (RL) with general function approximation (FA) focus on understanding the statistical complexity or regret bounds. However, the computation complexity of such approaches is far from being…

Machine Learning · Computer Science 2023-04-19 Dingwen Kong , Ruslan Salakhutdinov , Ruosong Wang , Lin F. Yang