English
Related papers

Related papers: Understanding and Improving Layer Normalization

200 papers

Layer normalization (LN) is a fundamental component in modern deep learning, but its per-sample centering and scaling introduce non-negligible inference overhead. RMSNorm improves efficiency by removing the centering operation, yet this may…

Machine Learning · Computer Science 2026-05-15 Yuxin Guo , Yihao Yue , Yunhao Ni , Yizhou Ruan , Jie Luo , Wenjun Wu , Lei Huang

Layer normalization (LayerNorm) has been successfully applied to various deep neural networks to help stabilize training and boost model convergence because of its capability in handling re-centering and re-scaling of both inputs and weight…

Machine Learning · Computer Science 2019-10-17 Biao Zhang , Rico Sennrich

LayerNorm is pivotal in Vision Transformers (ViTs), yet its fine-tuning dynamics under data scarcity and domain shifts remain underexplored. This paper shows that shifts in LayerNorm parameters after fine-tuning (LayerNorm shifts) are…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Zhaorui Tan , Tan Pan , Kaizhu Huang , Weimiao Yu , Kai Yao , Chen Jiang , Qiufeng Wang , Anh Nguyen , Xin Guo , Yuan Cheng , Xi Yang

This paper presents a novel geometric interpretation of LayerNorm and explores how LayerNorm influences the norm and orientation of hidden vectors in the representation space. With these geometric insights, we prepare the foundation for…

Machine Learning · Computer Science 2025-02-04 Akshat Gupta , Atahan Ozdemir , Gopala Anumanchipalli

Layer Normalization (LayerNorm) is an inherent component in all Transformer-based models. In this paper, we show that LayerNorm is crucial to the expressivity of the multi-head attention layer that follows it. This is in contrast to the…

Machine Learning · Computer Science 2023-05-12 Shaked Brody , Uri Alon , Eran Yahav

Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization…

Computer Vision and Pattern Recognition · Computer Science 2018-07-16 Zhao Chen , Vijay Badrinarayanan , Chen-Yu Lee , Andrew Rabinovich

Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly…

Machine Learning · Statistics 2019-04-16 Shibani Santurkar , Dimitris Tsipras , Andrew Ilyas , Aleksander Madry

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative…

Machine Learning · Computer Science 2021-10-27 Ekdeep Singh Lubana , Robert P. Dick , Hidenori Tanaka

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the…

Computation and Language · Computer Science 2023-05-17 Zhuoyuan Mao , Raj Dabre , Qianying Liu , Haiyue Song , Chenhui Chu , Sadao Kurohashi

Layer Normalization (LayerNorm) is one of the fundamental components in transformers that stabilizes training and improves optimization. In recent times, Pre-LayerNorm transformers have become the preferred choice over Post-LayerNorm…

Machine Learning · Computer Science 2025-11-14 Rishi Singhal , Jung-Eun Kim

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the…

Machine Learning · Statistics 2016-07-22 Jimmy Lei Ba , Jamie Ryan Kiros , Geoffrey E. Hinton

Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and…

Machine Learning · Computer Science 2023-10-27 Zixuan Jiang , Jiaqi Gu , Hanqing Zhu , David Z. Pan

A technical note aiming to offer deeper intuition for the LayerNorm function common in deep neural networks. LayerNorm is defined relative to a distinguished 'neural' basis, but it does more than just normalize the corresponding vector…

Machine Learning · Computer Science 2024-05-08 Paul M. Riechers

The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters. CovNorm is a data…

Computer Vision and Pattern Recognition · Computer Science 2019-06-26 Yunsheng Li , Nuno Vasconcelos

The placement of normalization layers, specifically Pre-Norm and Post-Norm, remains an open question in Transformer architecture design. In this work, we rethink these approaches through the lens of manifold optimization, interpreting the…

Normalization layer constitutes an essential component in neural networks. In transformers, the predominantly used RMSNorm constrains vectors to a unit hypersphere, followed by dimension-wise rescaling through a learnable scaling…

Machine Learning · Computer Science 2026-02-12 Wenrui Cai , Defa Zhu , Qingjie Liu , Qiyang Min

Normalization techniques have only recently begun to be exploited in supervised learning tasks. Batch normalization exploits mini-batch statistics to normalize the activations. This was shown to speed up training and result in better…

Machine Learning · Computer Science 2017-03-08 Mengye Ren , Renjie Liao , Raquel Urtasun , Fabian H. Sinz , Richard S. Zemel

Normalization like Batch Normalization (BN) is a milestone technique to normalize the distributions of intermediate layers in deep learning, enabling faster training and better generalization accuracy. However, in fidelity image…

Image and Video Processing · Electrical Eng. & Systems 2021-11-30 Jie Liu , Jie Tang , Gangshan Wu

Normalization is known to help the optimization of deep neural networks. Curiously, different architectures require specialized normalization methods. In this paper, we study what normalization is effective for Graph Neural Networks (GNNs).…

Machine Learning · Computer Science 2021-06-14 Tianle Cai , Shengjie Luo , Keyulu Xu , Di He , Tie-Yan Liu , Liwei Wang

Normalization layers (e.g., Batch Normalization, Layer Normalization) were introduced to help with optimization difficulties in very deep nets, but they clearly also help generalization, even in not-so-deep nets. Motivated by the long-held…

Machine Learning · Computer Science 2023-01-18 Kaifeng Lyu , Zhiyuan Li , Sanjeev Arora
‹ Prev 1 2 3 10 Next ›