Related papers: Maximum Entropy RL (Provably) Solves Some Robust R…

When Maximum Entropy Misleads Policy Optimization

The Maximum Entropy Reinforcement Learning (MaxEnt RL) framework is a leading approach for achieving efficient learning and robust performance across many RL tasks. However, MaxEnt methods have also been shown to struggle with…

Machine Learning · Computer Science 2025-06-13 Ruipeng Zhang , Ya-Chien Chang , Sicun Gao

Do You Need the Entropy Reward (in Practice)?

Maximum entropy (MaxEnt) RL maximizes a combination of the original task reward and an entropy reward. It is believed that the regularization imposed by entropy, on both policy improvement and policy evaluation, together contributes to good…

Machine Learning · Computer Science 2022-02-01 Haonan Yu , Haichao Zhang , Wei Xu

Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms

We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse Reinforcement Learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert…

Machine Learning · Computer Science 2021-06-08 Aaron J. Snoswell , Surya P. N. Singh , Nan Ye

Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving…

Machine Learning · Computer Science 2025-12-23 Pierre-François Massiani , Alexander von Rohr , Lukas Haverbeck , Sebastian Trimpe

If MaxEnt RL is the Answer, What is the Question?

Experimentally, it has been observed that humans and animals often make decisions that do not maximize their expected utility, but rather choose outcomes randomly, with probability proportional to expected utility. Probability matching, as…

Machine Learning · Computer Science 2019-10-07 Benjamin Eysenbach , Sergey Levine

Maximizing Confidence Alone Improves Reasoning

Reinforcement learning (RL) has enabled machine learning models to achieve significant advances in many fields. Most recently, RL has empowered frontier language models to solve challenging math, science, and coding problems. However,…

Machine Learning · Computer Science 2025-06-30 Mihir Prabhudesai , Lili Chen , Alex Ippoliti , Katerina Fragkiadaki , Hao Liu , Deepak Pathak

Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy…

Machine Learning · Computer Science 2024-10-29 Chen-Hao Chao , Chien Feng , Wei-Fang Sun , Cheng-Kuang Lee , Simon See , Chun-Yi Lee

To the Max: Reinventing Reward in Reinforcement Learning

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the…

Machine Learning · Computer Science 2025-02-25 Grigorii Veviurko , Wendelin Böhmer , Mathijs de Weerdt

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not…

Machine Learning · Computer Science 2023-03-03 Zuxin Liu , Zijian Guo , Zhepeng Cen , Huan Zhang , Jie Tan , Bo Li , Ding Zhao

Action Redundancy in Reinforcement Learning

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization. However, action entropy does not necessarily coincide with state entropy, e.g., when multiple…

Machine Learning · Computer Science 2021-07-27 Nir Baram , Guy Tennenholtz , Shie Mannor

Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation

Entropy Regularisation is a widely adopted technique that enhances policy optimisation performance and stability. A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising…

Machine Learning · Computer Science 2024-07-26 Jean Seong Bjorn Choe , Jong-Kook Kim

A Max-Min Entropy Framework for Reinforcement Learning

In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning. Whereas the…

Machine Learning · Computer Science 2021-12-21 Seungyul Han , Youngchul Sung

Extreme Q-Learning: MaxEnt RL without Entropy

Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for…

Machine Learning · Computer Science 2023-03-02 Divyansh Garg , Joey Hejna , Matthieu Geist , Stefano Ermon

Train Hard, Fight Easy: Robust Meta Reinforcement Learning

A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients. Meta-RL (MRL) addresses this issue by learning a meta-policy that adapts to new tasks. Standard MRL methods…

Machine Learning · Computer Science 2023-10-03 Ido Greenberg , Shie Mannor , Gal Chechik , Eli Meirom

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the…

Machine Learning · Computer Science 2021-01-29 Sobhan Miryoosefi , Kianté Brantley , Hal Daumé , Miroslav Dudik , Robert Schapire

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Reinforcement learning (RL) algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. Most common RL algorithms use undirected exploration, i.e., select random sequences of…

Machine Learning · Computer Science 2025-08-01 Bhavya Sukhija , Stelian Coros , Andreas Krause , Pieter Abbeel , Carmelo Sferrazza

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges. First, most RL methods require intensive data from the exploration of the…

Machine Learning · Computer Science 2021-07-06 Zhe Xu , Bo Wu , Aditya Ojha , Daniel Neider , Ufuk Topcu

Maximum Entropy Deep Inverse Reinforcement Learning

This paper presents a general framework for exploiting the representational capacity of neural networks to approximate complex, nonlinear reward functions in the context of solving the inverse reinforcement learning (IRL) problem. We show…

Machine Learning · Computer Science 2016-03-14 Markus Wulfmeier , Peter Ondruska , Ingmar Posner

Maximum Causal Entropy Inverse Constrained Reinforcement Learning

When deploying artificial agents in real-world environments where they interact with humans, it is crucial that their behavior is aligned with the values, social norms or other requirements of that environment. However, many environments…

Machine Learning · Computer Science 2023-05-05 Mattijs Baert , Pietro Mazzaglia , Sam Leroux , Pieter Simoens

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Maximum entropy reinforcement learning integrates exploration into policy learning by providing additional intrinsic rewards proportional to the entropy of some distribution. In this paper, we propose a novel approach in which the intrinsic…

Machine Learning · Computer Science 2025-09-30 Adrien Bolland , Gaspard Lambrechts , Damien Ernst