Related papers: KL Regularized Normalization Framework for Low Res…

Robust Deep Joint Source Channel Coding for Task-Oriented Semantic Communications

Semantic communications based on deep joint source-channel coding (JSCC) aim to improve communication efficiency by transmitting only task-relevant information. However, ensuring robustness to the stochasticity of communication channels…

Signal Processing · Electrical Eng. & Systems 2025-03-18 Taewoo Park , Eunhye Hong , Yo-Seb Jeon , Namyoon Lee , Yongjune Kim

Query-Key Normalization for Transformers

Low-resource language translation is a challenging but socially valuable NLP task. Building on recent work adapting the Transformer's normalization to this setting, we propose QKNorm, a normalization technique that modifies the attention…

Computation and Language · Computer Science 2020-10-12 Alex Henry , Prudhvi Raj Dachapally , Shubham Pawar , Yuxuan Chen

Leverage the Average: an Analysis of KL Regularization in RL

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance. Yet, only little is understood theoretically about why KL regularization helps, so far.…

Machine Learning · Computer Science 2021-01-07 Nino Vieillard , Tadashi Kozuno , Bruno Scherrer , Olivier Pietquin , Rémi Munos , Matthieu Geist

Regularization Matters in Policy Optimization

Deep Reinforcement Learning (Deep RL) has been receiving increasingly more attention thanks to its encouraging performance on a variety of control tasks. Yet, conventional regularization techniques in training neural networks (e.g., $L_2$…

Machine Learning · Computer Science 2021-11-30 Zhuang Liu , Xuanlin Li , Bingyi Kang , Trevor Darrell

Regularizing Neural Machine Translation by Target-bidirectional Agreement

Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation…

Computation and Language · Computer Science 2018-11-14 Zhirui Zhang , Shuangzhi Wu , Shujie Liu , Mu Li , Ming Zhou , Tong Xu

KL Guided Domain Adaptation

Domain adaptation is an important problem and often needed for real-world applications. In this problem, instead of i.i.d. training and testing datapoints, we assume that the source (training) data and the target (testing) data have…

Machine Learning · Computer Science 2022-03-15 A. Tuan Nguyen , Toan Tran , Yarin Gal , Philip H. S. Torr , Atılım Güneş Baydin

Learning in Compact Spaces with Approximately Normalized Transformer

The successful training of deep neural networks requires addressing challenges such as overfitting, numerical instabilities leading to divergence, and increasing variance in the residual stream. A common solution is to apply regularization…

Machine Learning · Computer Science 2025-11-20 Jörg K. H. Franke , Urs Spiegelhalter , Marianna Nezhurina , Jenia Jitsev , Frank Hutter , Michael Hefenbrock

Regularized Training of Nearest Neighbor Language Models

Including memory banks in a natural language processing architecture increases model capacity by equipping it with additional data at inference time. In this paper, we build upon $k$NN-LM \citep{khandelwal20generalization}, which uses a…

Computation and Language · Computer Science 2021-09-20 Jean-Francois Ton , Walter Talbott , Shuangfei Zhai , Josh Susskind

Large Language Models as Attribution Regularizers for Efficient Model Training

Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. However, effectively leveraging their vast knowledge for training smaller downstream models remains an open challenge, especially in domains like…

Machine Learning · Computer Science 2025-07-28 Davor Vukadin , Marin Šilić , Goran Delač

A Comedy of Estimators: On KL Regularization in RL Training of LLMs

The reasoning performance of large language models (LLMs) can be substantially improved by training them with reinforcement learning (RL). The RL objective for LLM training involves a regularization term, which is the reverse…

Machine Learning · Computer Science 2026-03-19 Vedant Shah , Johan Obando-Ceron , Vineet Jain , Brian Bartoldson , Bhavya Kailkhura , Sarthak Mittal , Glen Berseth , Pablo Samuel Castro , Yoshua Bengio , Nikolay Malkin , Moksh Jain , Siddarth Venkatraman , Aaron Courville

Sharp Analysis for KL-Regularized Contextual Bandits and RLHF

Reverse-Kullback-Leibler (KL) regularization has emerged to be a predominant technique used to enhance policy optimization in reinforcement learning (RL) and reinforcement learning from human feedback (RLHF), which forces the learned policy…

Machine Learning · Computer Science 2025-02-12 Heyang Zhao , Chenlu Ye , Quanquan Gu , Tong Zhang

PowerNorm: Rethinking Batch Normalization in Transformers

The standard normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN). This is different than batch normalization (BN), which is widely-adopted in Computer Vision. The…

Computation and Language · Computer Science 2021-04-21 Sheng Shen , Zhewei Yao , Amir Gholami , Michael W. Mahoney , Kurt Keutzer

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on…

Computation and Language · Computer Science 2023-11-13 Hang Hua , Xingjian Li , Dejing Dou , Cheng-Zhong Xu , Jiebo Luo

Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods

Normalizing flows can generate complex target distributions and thus show promise in many applications in Bayesian statistics as an alternative or complement to MCMC for sampling posteriors. Since no data set from the target posterior…

Machine Learning · Statistics 2021-07-19 Marylou Gabrié , Grant M. Rotskoff , Eric Vanden-Eijnden

Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains

Neural language modeling (LM) has led to significant improvements in several applications, including Automatic Speech Recognition. However, they typically require large amounts of training data, which is not available for many domains and…

Computation and Language · Computer Science 2019-06-05 Navid Rekabsaz , Nikolaos Pappas , James Henderson , Banriskhem K. Khonglah , Srikanth Madikeri

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

\emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $\epsilon^{-1}$-type fast rates for decision making under…

Machine Learning · Computer Science 2026-05-12 Qingyue Zhao , Kaixuan Ji , Heyang Zhao , Quanquan Gu

Deep Adaptive Input Normalization for Time Series Forecasting

Deep Learning (DL) models can be used to tackle time series analysis tasks with great success. However, the performance of DL models can degenerate rapidly if the data are not appropriately normalized. This issue is even more apparent when…

Computational Finance · Quantitative Finance 2019-09-24 Nikolaos Passalis , Anastasios Tefas , Juho Kanniainen , Moncef Gabbouj , Alexandros Iosifidis

Stochastic Function Norm Regularization of Deep Networks

Deep neural networks have had an enormous impact on image analysis. State-of-the-art training methods, based on weight decay and DropOut, result in impressive performance when a very large training set is available. However, they tend to…

Machine Learning · Computer Science 2019-09-02 Amal Rannen Triki , Matthew B. Blaschko

Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences

Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a unified information-geometric framework, we…

Optimization and Control · Mathematics 2026-02-03 Viktor Stein , Adwait Datar , Nihat Ay

Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often…

Computation and Language · Computer Science 2025-06-27 Zhengyan Shi