English
Related papers

Related papers: Entropy-regularized Point-based Value Iteration

200 papers

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim…

Machine Learning · Computer Science 2019-06-11 Zafarali Ahmed , Nicolas Le Roux , Mohammad Norouzi , Dale Schuurmans

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate…

Machine Learning · Computer Science 2022-08-26 Ben Chugg , Peter Henderson , Jacob Goldin , Daniel E. Ho

Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to…

Systems and Control · Electrical Eng. & Systems 2026-02-18 Menno van Zutphen , Giannis Delimpaltadakis , Maurice Heemels , Duarte Antunes

Predictive inference requires balancing statistical accuracy against informational complexity, yet the choice of complexity measure is usually imposed rather than derived. We treat econometric objects as predictive rules, mappings from…

Statistics Theory · Mathematics 2026-02-16 Nicholas G. Polson , Daniel Zantedeschi

Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL…

Machine Learning · Statistics 2019-10-15 Elena Smirnova , Elvis Dohmatob

Existing training criteria in automatic speech recognition(ASR) permit the model to freely explore more than one time alignments between the feature and label sequences. In this paper, we use entropy to measure a model's uncertainty, i.e.…

Computation and Language · Computer Science 2022-12-26 Ehsan Variani , Ke Wu , David Rybach , Cyril Allauzen , Michael Riley

We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty. Standard POMDP techniques are shown to offer bounded-error…

Systems and Control · Electrical Eng. & Systems 2023-05-10 Timothy L. Molloy , Girish N. Nair

Sample-based trajectory optimisers are a promising tool for the control of robotics with non-differentiable dynamics and cost functions. Contemporary approaches derive from a restricted subclass of stochastic optimal control where the…

Robotics · Computer Science 2021-10-07 Tom Lefebvre , Guillaume Crevecoeur

In recent years, many recommender systems have utilized textual data for topic extraction to enhance interpretability. However, our findings reveal a noticeable deficiency in the coherence of keywords within topics, resulting in low…

Computation and Language · Computer Science 2023-06-14 Xuefei Jiang , Dairui Liu , Ruihai Dong

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world…

Machine Learning · Computer Science 2022-08-16 Kenneth Bogert , Yikang Gui , Prashant Doshi

This paper aims to establish an entropy-regularized value-based reinforcement learning method that can ensure the monotonic improvement of policies at each policy update. Unlike previously proposed lower-bounds on policy improvement in…

Machine Learning · Computer Science 2020-08-26 Lingwei Zhu , Takamitsu Matsubara

Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Zafarali Ahmed , Doina Precup

Entropy augmented to reward is known to soften the greedy argmax policy to softmax policy. Entropy augmentation is reformulated and leads to a motivation to introduce an additional entropy term to the objective function in the form of…

Machine Learning · Computer Science 2020-06-08 Donghoon Lee

In this effort, we consider the impact of regularization on the diversity of actions taken by policies generated from reinforcement learning agents trained using a policy gradient. Policy gradient agents are prone to entropy collapse, which…

Machine Learning · Computer Science 2023-10-10 Andrew Starnes , Anton Dereventsov , Clayton Webster

High level declarative constraints provide a powerful (and popular) way to define and construct control policies; however, most synthesis algorithms do not support specifying the degree of randomness (unpredictability) of the resulting…

Robotics · Computer Science 2021-06-30 Marcell Vazquez-Chanlatte , Sebastian Junges , Daniel J. Fremont , Sanjit Seshia

Identifying uncertainty and taking mitigating actions is crucial for safe and trustworthy reinforcement learning agents, especially when deployed in high-risk environments. In this paper, risk sensitivity is promoted in a model-based…

Machine Learning · Computer Science 2021-11-10 Stefan Radic Webster , Peter Flach

Entropic regularization of policies in Reinforcement Learning (RL) is a commonly used heuristic to ensure that the learned policy explores the state-space sufficiently before overfitting to a local optimal policy. The primary motivation for…

Machine Learning · Computer Science 2021-01-19 Hisham Husain , Kamil Ciosek , Ryota Tomioka

Reinforcement learning (RL) has become a key approach for enhancing reasoning in large language models (LLMs), yet scalable training is often hindered by the rapid collapse of policy entropy, which leads to premature convergence and…

Machine Learning · Computer Science 2026-04-14 Ming Lei , Christophe Baehr

Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampling-based and therefore computationally…

Machine Learning · Computer Science 2026-03-09 Azza Jenane , Nassim Walha , Lukas Kuhn , Florian Buettner

Maximization of an expensive, unimodal function under random observations has been an important problem in hyperparameter tuning. It features expensive function evaluations (which means small budgets) and a high level of noise. We develop…

Optimization and Control · Mathematics 2023-02-23 Xiaohe Luo , Warren B. Powell
‹ Prev 1 2 3 10 Next ›