English
Related papers

Related papers: Alignment Entropy Regularization

200 papers

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate…

Machine Learning · Computer Science 2022-08-26 Ben Chugg , Peter Henderson , Jacob Goldin , Daniel E. Ho

Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one,…

Computation and Language · Computer Science 2020-05-13 Clara Meister , Elizabeth Salesky , Ryan Cotterell

Reasoning ability has become a defining capability of Large Language Models (LLMs), with Reinforcement Learning with Verifiable Rewards (RLVR) emerging as a key paradigm to enhance it. However, RLVR training often suffers from policy…

Machine Learning · Computer Science 2026-04-20 Xiaoyun Zhang , Xiaojian Yuan , Di Huang , Wang You , Chen Hu , Jingqing Ruan , Ai Jian , Kejiang Chen , Xing Hu

State entropy regularization has empirically shown better exploration and sample complexity in reinforcement learning (RL). However, its theoretical guarantees have not been studied. In this paper, we show that state entropy regularization…

Machine Learning · Computer Science 2025-12-02 Yonatan Ashlag , Uri Koren , Mirco Mutti , Esther Derman , Pierre-Luc Bacon , Shie Mannor

We study the problem of entropy calibration, which asks whether a language model's entropy over generations matches its log loss on human text. Past work found that models are miscalibrated, with entropy per step increasing as generations…

Computation and Language · Computer Science 2026-01-14 Steven Cao , Gregory Valiant , Percy Liang

Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a dominant paradigm. Although recent LLM-based ASR models have shown promising performance on public benchmarks, it remains challenging to balance…

Audio and Speech Processing · Electrical Eng. & Systems 2026-04-10 Yuan Xie , Jiaqi Song , Guang Qiu , Xianliang Wang , Ming Lei , Jie Gao , Jie Wu

Entropy regularization is used to get improved optimization performance in reinforcement learning tasks. A common form of regularization is to maximize policy entropy to avoid premature convergence and lead to more stochastic policies for…

Machine Learning · Computer Science 2019-12-12 Riashat Islam , Zafarali Ahmed , Doina Precup

Consistency regularization is a commonly used practice to encourage the model to generate consistent representation from distorted input features and improve model generalization. It shows significant improvement on various speech…

Computation and Language · Computer Science 2024-11-12 Cindy Tseng , Yun Tang , Vijendra Raj Apsingekar

In streaming settings, speech recognition models have to map sub-sequences of speech to text before the full audio stream becomes available. However, since alignment information between speech and text is rarely available during training,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-12-20 Oscar Chang , Dongseong Hwang , Olivier Siohan

Consistency regularization is a commonly-used technique for semi-supervised and self-supervised learning. It is an auxiliary objective function that encourages the prediction of the network to be similar in the vicinity of the observed…

Machine Learning · Computer Science 2021-10-05 Erik Englesson , Hossein Azizpour

We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation models. We show that measuring uncertainty in…

Computation and Language · Computer Science 2023-04-18 Lorenz Kuhn , Yarin Gal , Sebastian Farquhar

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze this claim…

Machine Learning · Computer Science 2019-06-11 Zafarali Ahmed , Nicolas Le Roux , Mohammad Norouzi , Dale Schuurmans

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible while constrained to match empirically estimated feature expectations. However, in many real-world…

Machine Learning · Computer Science 2022-08-16 Kenneth Bogert , Yikang Gui , Prashant Doshi

Chinese text recognition is more challenging than Latin text due to the large amount of fine-grained Chinese characters and the great imbalance over classes, which causes a serious overfitting problem. We propose to apply Maximum Entropy…

Computer Vision and Pattern Recognition · Computer Science 2020-07-10 Changxu Cheng , Wuheng Xu , Xiang Bai , Bin Feng , Wenyu Liu

In this work, we introduce Entropy Area Score (EAS), a simple yet effective metric to quantify uncertainty in the answer generation process of reasoning large language models (LLMs). EAS requires neither external models nor repeated…

Artificial Intelligence · Computer Science 2025-08-29 Yongfu Zhu , Lin Sun , Guangxiang Zhao , Weihong Lin , Xiangzheng Zhang

Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. However, they are often overconfident in their predictions, which leads to inaccurate and miscalibrated…

Machine Learning · Computer Science 2021-02-23 Jeffrey Willette , Juho Lee , Sung Ju Hwang

For RL algorithms, appropriate entropy control is crucial to their effectiveness. To control the policy entropy, a commonly used method is entropy regularization, which is adopted in various popular RL algorithms including PPO, SAC and A3C.…

Machine Learning · Computer Science 2026-02-06 Han Shen

In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-19 SooHwan Eom , Eunseop Yoon , Hee Suk Yoon , Chanwoo Kim , Mark Hasegawa-Johnson , Chang D. Yoo

Reinforcement learning (RL) has become a key approach for enhancing reasoning in large language models (LLMs), yet scalable training is often hindered by the rapid collapse of policy entropy, which leads to premature convergence and…

Machine Learning · Computer Science 2026-04-14 Ming Lei , Christophe Baehr

The problem addressed concerns the determination of the average number of successive attempts of guessing a word of a certain length consisting of letters with given probabilities of occurrence. Both first- and second-order approximations…

Information Theory · Computer Science 2015-06-19 Kerstin Andersson
‹ Prev 1 2 3 10 Next ›