Related papers: Model-based Reinforcement Learning for Continuous …

Prior-dependent analysis of posterior sampling reinforcement learning with function approximation

This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine…

Machine Learning · Statistics 2024-03-19 Yingru Li , Zhi-Quan Luo

(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning…

Machine Learning · Statistics 2013-12-30 Ian Osband , Daniel Russo , Benjamin Van Roy

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is an effective heuristic for decision-making under uncertainty that has been used to develop…

Machine Learning · Statistics 2026-03-10 Hamish Flynn , Joe Watson , Ingmar Posner , Jan Peters

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments. The approach,…

Machine Learning · Computer Science 2025-10-15 Wanqiao Xu , Shi Dong , Benjamin Van Roy

Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

Constrained Markov decision processes (CMDPs) model scenarios of sequential decision making with multiple objectives that are increasingly important in many applications. However, the model is often unknown and must be learned online while…

Machine Learning · Computer Science 2023-01-30 Krishna C Kalagarla , Rahul Jain , Pierluigi Nuzzo

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret minimization. We focus on finite-horizon episodic RL where the transition model $P$ belongs to a known family of models $\mathcal{P}$, a special case of which is when…

Machine Learning · Computer Science 2020-06-02 Alex Ayoub , Zeyu Jia , Csaba Szepesvari , Mengdi Wang , Lin F. Yang

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL-SSP, a simple posterior sampling-based reinforcement learning…

Machine Learning · Computer Science 2021-06-11 Mehdi Jafarnia-Jahromi , Liyu Chen , Rahul Jain , Haipeng Luo

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after…

Machine Learning · Statistics 2022-09-30 Daniil Tiapkin , Denis Belomestny , Daniele Calandriello , Eric Moulines , Remi Munos , Alexey Naumov , Mark Rowland , Michal Valko , Pierre Menard

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2024-05-30 Danil Provodin , Maurits Kaptein , Mykola Pechenizkiy

Posterior Sampling for Large Scale Reinforcement Learning

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is…

Machine Learning · Computer Science 2018-10-24 Georgios Theocharous , Zheng Wen , Yasin Abbasi-Yadkori , Nikos Vlassis

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms…

Machine Learning · Computer Science 2023-11-07 Nikki Lijing Kuang , Ming Yin , Mengdi Wang , Yu-Xiang Wang , Yi-An Ma

Posterior Sampling for Deep Reinforcement Learning

Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an…

Machine Learning · Computer Science 2023-05-19 Remo Sasso , Michelangelo Conserva , Paulo Rauber

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the…

Machine Learning · Statistics 2017-06-14 Ian Osband , Benjamin Van Roy

Near-optimal Reinforcement Learning in Factored MDPs

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action…

Machine Learning · Statistics 2014-11-04 Ian Osband , Benjamin Van Roy

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous…

Machine Learning · Computer Science 2023-09-28 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems

We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend…

Machine Learning · Computer Science 2025-07-25 Yilie Huang , Yanwei Jia , Xun Yu Zhou

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However…

Machine Learning · Computer Science 2019-06-14 Lin F. Yang , Mengdi Wang

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based…

Machine Learning · Computer Science 2023-01-10 Arnob Ghosh , Xingyu Zhou , Ness Shroff

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is…

Machine Learning · Computer Science 2019-05-07 Lin F. Yang , Chengzhuo Ni , Mengdi Wang

Efficient Reinforcement Learning via Initial Pure Exploration

In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice…

Machine Learning · Computer Science 2017-06-08 Sudeep Raja Putta , Theja Tulabandhula