Related papers: Implicit Distributional Reinforcement Learning

Distributional Reinforcement Learning via the Cram\'er Distance

This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of such algorithm named Cram\'er-based Distributional Soft Actor-Critic…

Machine Learning · Computer Science 2026-05-12 Vanya Aziz , Ivo Nowak , E. M. T Hendrix

Decoupling Value and Policy for Generalization in Reinforcement Learning

Standard deep reinforcement learning algorithms use a shared representation for the policy and value function, especially when training directly from images. However, we argue that more information is needed to accurately estimate the value…

Machine Learning · Computer Science 2021-06-16 Roberta Raileanu , Rob Fergus

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from…

Machine Learning · Computer Science 2020-10-27 Sergio Valcarcel Macua , Aleksi Tukiainen , Daniel García-Ocaña Hernández , David Baldazo , Enrique Munoz de Cote , Santiago Zazo

Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients

Recent advances in deep reinforcement learning have achieved impressive results in a wide range of complex tasks, but poor sample efficiency remains a major obstacle to real-world deployment. Soft actor-critic (SAC) mitigates this problem…

Machine Learning · Computer Science 2024-09-10 Luca Della Libera

Distributional Soft Actor-Critic with Diffusion Policy

Reinforcement learning has been proven to be highly effective in handling complex control tasks. Traditional methods typically use unimodal distributions, such as Gaussian distributions, to model the output of value distributions. However,…

Machine Learning · Computer Science 2025-07-14 Tong Liu , Yinuo Wang , Xujie Song , Wenjun Zou , Liangfa Chen , Likun Wang , Bin Shuai , Jingliang Duan , Shengbo Eben Li

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward collected during its interaction with the environment by approximating the optimal policy through the estimation of value functions. Typically, given a state s and action…

Machine Learning · Computer Science 2018-06-20 Shangda Li , Selina Bing , Steven Yang

Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors

In reinforcement learning (RL), function approximation errors are known to easily lead to the Q-value overestimations, thus greatly reducing policy performance. This paper presents a distributional soft actor-critic (DSAC) algorithm, which…

Machine Learning · Computer Science 2021-06-14 Jingliang Duan , Yang Guan , Shengbo Eben Li , Yangang Ren , Bo Cheng

Conservative Offline Distributional Reinforcement Learning

Many reinforcement learning (RL) problems in practice are offline, learning purely from observational data. A key challenge is how to ensure the learned policy is safe, which requires quantifying the risk associated with different actions.…

Machine Learning · Computer Science 2021-10-28 Yecheng Jason Ma , Dinesh Jayaraman , Osbert Bastani

Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update…

Machine Learning · Computer Science 2019-03-25 Yan Zhang , Michael M. Zavlanos

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

We introduce D2AC, a new model-free reinforcement learning (RL) algorithm designed to train expressive diffusion policies online effectively. At its core is a policy improvement objective that avoids the high variance of typical policy…

Machine Learning · Computer Science 2026-05-25 Lunjun Zhang , Shuo Han , Hanrui Lyu , Bradly C Stadie

GRAC: Self-Guided and Self-Regularized Actor-Critic

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network…

Machine Learning · Computer Science 2020-11-12 Lin Shao , Yifan You , Mengyuan Yan , Qingyun Sun , Jeannette Bohg

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple interactions with the environment. Ignorant random exploration may cause a waste of time and resources. It's essential to alleviate such waste. As discussed in this…

Machine Learning · Computer Science 2022-06-24 Tairan Huang , Xu Li , Hao Li , Mingming Sun , Ping Li

Discriminator Soft Actor Critic without Extrinsic Rewards

It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution…

Machine Learning · Computer Science 2020-02-03 Daichi Nishio , Daiki Kuyoshi , Toi Tsuneda , Satoshi Yamane

Fully Distributed Actor-Critic Architecture for Multitask Deep Reinforcement Learning

We propose a fully distributed actor-critic architecture, named Diff-DAC, with application to multitask reinforcement learning (MRL). During the learning process, agents communicate their value and policy parameters to their neighbours,…

Machine Learning · Computer Science 2021-10-26 Sergio Valcarcel Macua , Ian Davies , Aleksi Tukiainen , Enrique Munoz de Cote

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for…

Machine Learning · Computer Science 2018-10-16 Ilya Kostrikov , Kumar Krishna Agrawal , Debidatta Dwibedi , Sergey Levine , Jonathan Tompson

DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

We present Distributional Soft Actor-Critic (DSAC), a distributional reinforcement learning (RL) algorithm that combines the strengths of distributional information of accumulated rewards and entropy-driven exploration from Soft…

Machine Learning · Computer Science 2025-07-01 Xiaoteng Ma , Junyao Chen , Li Xia , Jun Yang , Qianchuan Zhao , Zhengyuan Zhou

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm…

Machine Learning · Computer Science 2019-06-10 Patrick Nadeem Ward , Ariella Smofsky , Avishek Joey Bose

Adversarial Imitation Learning via Boosting

Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy…

Machine Learning · Computer Science 2024-04-15 Jonathan D. Chang , Dhruv Sreenivas , Yingbing Huang , Kianté Brantley , Wen Sun

Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning

In offline reinforcement learning, it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. One class of methods, the policy-regularized method, addresses this problem by constraining the target…

Machine Learning · Computer Science 2025-02-26 Linjiajie Fang , Ruoxue Liu , Jing Zhang , Wenjia Wang , Bing-Yi Jing

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Hidden confounders that influence both states and actions can bias policy learning in reinforcement learning (RL), leading to suboptimal or non-generalizable behavior. Most RL algorithms ignore this issue, learning policies from…

Machine Learning · Computer Science 2025-06-09 Thanh Vinh Vo , Young Lee , Haozhe Ma , Chien Lu , Tze-Yun Leong