Related papers: Entropy-regularized Point-based Value Iteration

Understanding the impact of entropy on policy optimization

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim…

Machine Learning · Computer Science 2019-06-11 Zafarali Ahmed , Nicolas Le Roux , Mohammad Norouzi , Dale Schuurmans

Entropy Regularization for Population Estimation

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate…

Machine Learning · Computer Science 2022-08-26 Ben Chugg , Peter Henderson , Jacob Goldin , Daniel E. Ho

Predictable Interval MDPs through Entropy Regularization

Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to…

Systems and Control · Electrical Eng. & Systems 2026-02-18 Menno van Zutphen , Giannis Delimpaltadakis , Maurice Heemels , Duarte Antunes

Entropy-Regularized Inference: A Predictive Approach

Predictive inference requires balancing statistical accuracy against informational complexity, yet the choice of complexity measure is usually imposed rather than derived. We treat econometric objects as predictive rules, mappings from…

Statistics Theory · Mathematics 2026-02-16 Nicholas G. Polson , Daniel Zantedeschi

On the Convergence of Approximate and Regularized Policy Iteration Schemes

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob

Alignment Entropy Regularization

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e.…

Computation and Language · Computer Science 2022-12-26 Ehsan Variani , Ke Wu , David Rybach , Cyril Allauzen , Michael Riley

Entropy-Regularized Partially Observed Markov Decision Processes

We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty. Standard POMDP techniques are shown to offer bounded-error…

Systems and Control · Electrical Eng. & Systems 2023-05-10 Timothy L. Molloy , Girish N. Nair

Entropy Regularised Deterministic Optimal Control: From Path Integral Solution to Sample-Based Trajectory Optimisation

Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the…

Robotics · Computer Science 2021-10-07 Tom Lefebvre , Guillaume Crevecoeur

Enhancing Topic Extraction in Recommender Systems with Entropy Regularization

In recent years, many recommender systems have utilized textual data for topic extraction to enhance interpretability. However, our findings reveal a noticeable deficiency in the coherence of keywords within topics, resulting in low…

Computation and Language · Computer Science 2023-06-14 Xuefei Jiang , Dairui Liu , Ruihai Dong

IRL with Partial Observations using the Principle of Uncertain Maximum Entropy

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world…

Machine Learning · Computer Science 2022-08-16 Kenneth Bogert , Yikang Gui , Prashant Doshi

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

This paper aims to establish an entropy-regularized value-based reinforcement learning method that can ensure the monotonic improvement of policies at each policy update. Unlike previously proposed lower-bounds on policy improvement in…

Machine Learning · Computer Science 2020-08-26 Lingwei Zhu , Takamitsu Matsubara

Marginalized State Distribution Entropy Regularization in Policy Optimization

Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Zafarali Ahmed , Doina Precup

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of…

Machine Learning · Computer Science 2020-06-08 Donghoon Lee

Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks

In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which…

Machine Learning · Computer Science 2023-10-10 Andrew Starnes , Anton Dereventsov , Clayton Webster

Entropy-Guided Control Improvisation

High level declarative constraints provide a powerful (and popular) way to define and construct control policies; however, most synthesis algorithms do not support specifying the degree of randomness (unpredictability) of the resulting…

Robotics · Computer Science 2021-06-30 Marcell Vazquez-Chanlatte , Sebastian Junges , Daniel J. Fremont , Sanjit Seshia

Risk Sensitive Model-Based Reinforcement Learning using Uncertainty Guided Planning

Identifying uncertainty and taking mitigating actions is crucial for safe and trustworthy reinforcement learning agents, especially when deployed in high-risk environments. In this paper, risk sensitivity is promoted in a model-based…

Machine Learning · Computer Science 2021-11-10 Stefan Radic Webster , Peter Flach

Regularized Policies are Reward Robust

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for…

Machine Learning · Computer Science 2021-01-19 Hisham Husain , Kamil Ciosek , Ryota Tomioka

A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning

Reinforcement learning (RL) has become a key approach for enhancing reasoning in large language models (LLMs), yet scalable training is often hindered by the rapid collapse of policy entropy, which leads to premature convergence and…

Machine Learning · Computer Science 2026-04-14 Ming Lei , Christophe Baehr

From Entropy to Calibrated Uncertainty: Training Language Models to Reason About Uncertainty

Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampling-based and therefore computationally…

Machine Learning · Computer Science 2026-03-09 Azza Jenane , Nassim Walha , Lukas Kuhn , Florian Buettner

Entropy Minimization for Optimization of Expensive, Unimodal Functions

Maximization of an expensive, unimodal function under random observations has been an important problem in hyperparameter tuning. It features expensive function evaluations (which means small budgets) and a high level of noise. We develop…

Optimization and Control · Mathematics 2023-02-23 Xiaohe Luo , Warren B. Powell