Related papers: Diversity Actor-Critic: Sample-Aware Entropy Regul…

Striving for Simplicity and Performance in Off-Policy DRL: Output Normalization and Non-Uniform Sampling

We aim to develop off-policy DRL algorithms that not only exceed state-of-the-art performance but are also simple and minimalistic. For standard continuous control benchmarks, Soft Actor-Critic (SAC), which employs entropy maximization,…

Machine Learning · Computer Science 2020-12-08 Che Wang , Yanqiu Wu , Quan Vuong , Keith Ross

Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience

Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the…

Machine Learning · Computer Science 2021-09-27 Chayan Banerjee , Zhiyong Chen , Nasimul Noman

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm…

Machine Learning · Computer Science 2019-06-10 Patrick Nadeem Ward , Ariella Smofsky , Avishek Joey Bose

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary…

Machine Learning · Computer Science 2019-02-18 Gang Chen , Yiming Peng

DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate…

Machine Learning · Computer Science 2025-06-24 Dexter Neo , Tsuhan Chen

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in many domains, such as video games, recommendation systems and robotic control tasks. In continuous control tasks, widely used policies with Gaussian distributions results…

Machine Learning · Computer Science 2023-06-05 Lingwei Peng , Hui Qian , Zhebang Shen , Chao Zhang , Fei Li

Effective Reinforcement Learning Control using Conservative Soft Actor-Critic

Reinforcement Learning (RL) has shown great potential in complex control tasks, particularly when combined with deep neural networks within the Actor-Critic (AC) framework. However, in practical applications, balancing exploration, learning…

Robotics · Computer Science 2026-02-25 Zhiwei Shang , Xinyi Yuan , Wenjun Huang , Yunduan Cui , Di Chen , Meixin Zhu

Sample Efficient Actor-Critic with Experience Replay

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several…

Machine Learning · Computer Science 2017-07-11 Ziyu Wang , Victor Bapst , Nicolas Heess , Volodymyr Mnih , Remi Munos , Koray Kavukcuoglu , Nando de Freitas

Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

We propose a fully distributed actor-critic algorithm approximated by deep neural networks, named \textit{Diff-DAC}, with application to single-task and to average multitask reinforcement learning (MRL). Each agent has access to data from…

Machine Learning · Computer Science 2020-10-27 Sergio Valcarcel Macua , Aleksi Tukiainen , Daniel García-Ocaña Hernández , David Baldazo , Enrique Munoz de Cote , Santiago Zazo

Divergence-Regularized Multi-Agent Actor-Critic

Entropy regularization is a popular method in reinforcement learning (RL). Although it has many advantages, it alters the RL objective of the original Markov Decision Process (MDP). Though divergence regularization has been proposed to…

Machine Learning · Computer Science 2022-06-22 Kefan Su , Zongqing Lu

On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

Risk-aware Reinforcement Learning (RL) algorithms like SAC and TD3 were shown empirically to outperform their risk-neutral counterparts in a variety of continuous-action tasks. However, the theoretical basis for the pessimistic objectives…

Machine Learning · Computer Science 2024-05-27 Michal Nauman , Marek Cygan

Diffusion Actor-Critic with Entropy Regulator

Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with…

Machine Learning · Computer Science 2024-12-24 Yinuo Wang , Likun Wang , Yuxuan Jiang , Wenjun Zou , Tong Liu , Xujie Song , Wenxuan Wang , Liming Xiao , Jiang Wu , Jingliang Duan , Shengbo Eben Li

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Designing off-policy reinforcement learning algorithms is typically a very challenging task, because a desirable iteration update often involves an expectation over an on-policy distribution. Prior off-policy actor-critic (AC) algorithms…

Machine Learning · Computer Science 2021-07-20 Tengyu Xu , Zhuoran Yang , Zhaoran Wang , Yingbin Liang

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks…

Machine Learning · Computer Science 2021-12-22 Zhenyang Shi , Surya P. N. Singh

Soft Actor-Critic Algorithm with Truly-satisfied Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful…

Machine Learning · Computer Science 2023-07-04 Taisuke Kobayashi

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the…

Machine Learning · Computer Science 2023-03-02 Samuel Neumann , Sungsu Lim , Ajin Joseph , Yangchen Pan , Adam White , Martha White

Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the Past

Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves…

Machine Learning · Computer Science 2019-06-11 Che Wang , Keith Ross

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been…

Machine Learning · Computer Science 2021-02-15 Tengyu Xu , Zhe Wang , Yingbin Liang

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and…

Machine Learning · Computer Science 2026-01-01 Yuyang Zhang , Yang Hu , Bo Dai , Na Li

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine