Related papers: Parametric Return Density Estimation for Reinforce…

A Nonparametric Off-Policy Policy Gradient

Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient…

Machine Learning · Computer Science 2020-08-04 Samuele Tosatto , Joao Carvalho , Hany Abdulsamad , Jan Peters

R2L: Reliable Reinforcement Learning: Guaranteed Return & Reliable Policies in Reinforcement Learning

In this work, we address the problem of determining reliable policies in reinforcement learning (RL), with a focus on optimization under uncertainty and the need for performance guarantees. While classical RL algorithms aim at maximizing…

Machine Learning · Computer Science 2025-10-22 Nadir Farhi

On Distributional Reinforcement Learning in Chaotic Dynamical Systems

Chaotic dynamical systems pose a fundamental challenge for Reinforcement Learning (RL): exponential sensitivity to initial conditions induces high-variance bootstrap targets and poorly conditioned gradient updates. Chaotic dynamics arise…

Machine Learning · Computer Science 2026-05-29 James Rudd-Jones , Mirco Musolesi , María Pérez-Ortiz

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In…

Machine Learning · Computer Science 2022-05-25 Prashanth L. A. , Michael Fu

Distributional Reinforcement Learning via Moment Matching

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only…

Machine Learning · Computer Science 2020-12-10 Thanh Tang Nguyen , Sunil Gupta , Svetha Venkatesh

Policy Gradient Methods for Risk-Sensitive Distributional Reinforcement Learning with Provable Convergence

Risk-sensitive reinforcement learning (RL) is crucial for maintaining reliable performance in high-stakes applications. While traditional RL methods aim to learn a point estimate of the random cumulative cost, distributional RL (DRL) seeks…

Machine Learning · Computer Science 2025-02-03 Minheng Xiao , Xian Yu , Lei Ying

Density Constrained Reinforcement Learning

We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and…

Machine Learning · Computer Science 2021-06-25 Zengyi Qin , Yuxiao Chen , Chuchu Fan

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery,…

Machine Learning · Computer Science 2023-12-20 Sai Krishna Gottipati , Yashaswi Pathak , Rohan Nuttall , Sahir , Raviteja Chunduru , Ahmed Touati , Sriram Ganapathi Subramanian , Matthew E. Taylor , Sarath Chandar

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value…

Machine Learning · Computer Science 2025-07-08 Ju-Seung Byun , Andrew Perrault

Post Reinforcement Learning Inference

We study estimation and inference using data collected by reinforcement learning (RL) algorithms. These algorithms adaptively experiment by interacting with individual units over multiple stages, updating their strategies based on past…

Machine Learning · Statistics 2025-10-06 Vasilis Syrgkanis , Ruohan Zhan

Value Flows

While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications…

Machine Learning · Computer Science 2026-03-05 Perry Dong , Chongyi Zheng , Chelsea Finn , Dorsa Sadigh , Benjamin Eysenbach

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased…

Machine Learning · Computer Science 2023-12-06 Taehyun Cho , Seungyub Han , Heesoo Lee , Kyungjae Lee , Jungwoo Lee

Policy Evaluation in Distributional LQR (Extended Version)

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard…

Optimization and Control · Mathematics 2024-03-26 Zifan Wang , Yulong Gao , Siyi Wang , Michael M. Zavlanos , Alessandro Abate , Karl H. Johansson

Policy Evaluation in Distributional LQR

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the…

Optimization and Control · Mathematics 2023-03-27 Zifan Wang , Yulong Gao , Siyi Wang , Michael M. Zavlanos , Alessandro Abate , Karl H. Johansson

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Off-policy Reinforcement Learning (RL) holds the promise of better data efficiency as it allows sample reuse and potentially enables safe interaction with the environment. Current off-policy policy gradient methods either suffer from high…

Machine Learning · Computer Science 2021-06-09 Samuele Tosatto , João Carvalho , Jan Peters

Online Robust Reinforcement Learning with General Function Approximation

In many real-world settings, reinforcement learning systems suffer performance degradation when the environment encountered at deployment differs from that observed during training. Distributionally robust reinforcement learning (DR-RL)…

Machine Learning · Computer Science 2026-03-05 Debamita Ghosh , George K. Atia , Yue Wang

Multilinear Tensor Low-Rank Approximation for Policy-Gradient Methods in Reinforcement Learning

Reinforcement learning (RL) aims to estimate the action to take given a (time-varying) state, with the goal of maximizing a cumulative reward function. Predominantly, there are two families of algorithms to solve RL problems: value-based…

Machine Learning · Computer Science 2025-01-10 Sergio Rozada , Hoi-To Wai , Antonio G. Marques

Hyperbolic Deep Reinforcement Learning

We propose a new class of deep reinforcement learning (RL) algorithms that model latent representations in hyperbolic space. Sequential decision-making requires reasoning about the possible future consequences of current behavior.…

Machine Learning · Computer Science 2022-10-05 Edoardo Cetin , Benjamin Chamberlain , Michael Bronstein , Jonathan J Hunt

Is Risk-Sensitive Reinforcement Learning Properly Resolved?

Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by…

Machine Learning · Computer Science 2025-11-04 Ruiwen Zhou , Minghuan Liu , Kan Ren , Xufang Luo , Weinan Zhang , Dongsheng Li

Epistemic Risk-Sensitive Reinforcement Learning

We develop a framework for interacting with uncertain environments in reinforcement learning (RL) by leveraging preferences in the form of utility functions. We claim that there is value in considering different risk measures during…

Machine Learning · Computer Science 2021-02-23 Hannes Eriksson , Christos Dimitrakakis