Related papers: Root Mean Square Layer Normalization

Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm

Layer normalization (LN) is a fundamental component in modern deep learning, but its per-sample centering and scaling introduce non-negligible inference overhead. RMSNorm improves efficiency by removing the centering operation, yet this may…

Machine Learning · Computer Science 2026-05-15 Yuxin Guo , Yihao Yue , Yunhao Ni , Yizhou Ruan , Jie Luo , Wenjun Wu , Lei Huang

Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers

Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and…

Machine Learning · Computer Science 2023-10-27 Zixuan Jiang , Jiaqi Gu , Hanqing Zhu , David Z. Pan

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the…

Machine Learning · Statistics 2016-07-22 Jimmy Lei Ba , Jamie Ryan Kiros , Geoffrey E. Hinton

SeeDNorm: Self-Rescaled Dynamic Normalization

Normalization layer constitutes an essential component in neural networks. In transformers, the predominantly used RMSNorm constrains vectors to a unit hypersphere, followed by dimension-wise rescaling through a learnable scaling…

Machine Learning · Computer Science 2026-02-12 Wenrui Cai , Defa Zhu , Qingjie Liu , Qiyang Min

FlashNorm: Fast Normalization for Transformers

Normalization layers are ubiquitous in large language models (LLMs) yet represent a compute bottleneck: on hardware with distinct vector and matrix execution units, the RMS calculation blocks the subsequent matrix multiplication, preventing…

Machine Learning · Computer Science 2026-04-28 Nils Graef , Filip Makraduli , Andrew Wasielewski , Matthew Clapp

The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks

LayerNorm and RMSNorm impose fundamentally different geometric constraints on their outputs - and this difference has a precise, quantifiable consequence for model complexity. We prove that LayerNorm's mean-centering step, by confining data…

Machine Learning · Computer Science 2026-03-31 Sungbae Chun

Understanding and Improving Layer Normalization

Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness…

Machine Learning · Computer Science 2019-11-19 Jingjing Xu , Xu Sun , Zhiyuan Zhang , Guangxiang Zhao , Junyang Lin

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve…

Computer Vision and Pattern Recognition · Computer Science 2022-01-05 Sheng Liu , Xiao Li , Yuexiang Zhai , Chong You , Zhihui Zhu , Carlos Fernandez-Granda , Qing Qu

Geometric Interpretation of Layer Normalization and a Comparative Analysis with RMSNorm

This paper presents a novel geometric interpretation of LayerNorm and explores how LayerNorm influences the norm and orientation of hidden vectors in the representation space. With these geometric insights, we prepare the foundation for…

Machine Learning · Computer Science 2025-02-04 Akshat Gupta , Atahan Ozdemir , Gopala Anumanchipalli

MXNorm: Reusing MXFP block scales for efficient tensor normalisation

Matrix multiplication performance has long been the major bottleneck to scaling deep learning workloads, which has stimulated the design of new accelerators that use increasingly low-precision number formats. However, improvements in matrix…

Machine Learning · Computer Science 2026-03-16 Callum McLean , Luke Y. Prince , Alexandre Payot , Paul Balança , Carlo Luschi

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on…

Machine Learning · Computer Science 2024-10-30 Wen Fei , Wenrui Dai , Chenglin Li , Junni Zou , Hongkai Xiong

Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods

Advancements in parallel processing have lead to a surge in multilayer perceptrons' (MLP) applications and deep learning in the past decades. Recurrent Neural Networks (RNNs) give additional representational power to feedforward MLPs by…

Machine Learning · Statistics 2014-10-22 Saahil Ognawala , Justin Bayer

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative…

Machine Learning · Computer Science 2021-10-27 Ekdeep Singh Lubana , Robert P. Dick , Hidenori Tanaka

Neural Networks-based Regularization for Large-Scale Medical Image Reconstruction

In this paper we present a generalized Deep Learning-based approach for solving ill-posed large-scale inverse problems occuring in medical image reconstruction. Recently, Deep Learning methods using iterative neural networks and cascaded…

Image and Video Processing · Electrical Eng. & Systems 2020-08-26 Andreas Kofler , Markus Haltmeier , Tobias Schaeffter , Marc Kachelrieß , Marc Dewey , Christian Wald , Christoph Kolbitsch

How Does Batch Normalization Help Optimization?

Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly…

Machine Learning · Statistics 2019-04-16 Shibani Santurkar , Dimitris Tsipras , Andrew Ilyas , Aleksander Madry

Resurrecting Recurrent Neural Networks for Long Sequences

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have…

Machine Learning · Computer Science 2023-03-14 Antonio Orvieto , Samuel L Smith , Albert Gu , Anushan Fernando , Caglar Gulcehre , Razvan Pascanu , Soham De

Regularization Learning Networks: Deep Learning for Tabular Datasets

Despite their impressive performance, Deep Neural Networks (DNNs) typically underperform Gradient Boosting Trees (GBTs) on many tabular-dataset learning tasks. We propose that applying a different regularization coefficient to each weight…

Machine Learning · Statistics 2018-10-25 Ira Shavitt , Eran Segal

Normalized Convolutional Neural Network

We introduce a Normalized Convolutional Neural Layer, a novel approach to normalization in convolutional networks. Unlike conventional methods, this layer normalizes the rows of the im2col matrix during convolution, making it inherently…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Dongsuk Kim , Geonhee Lee , Myungjae Lee , Shin Uk Kang , Dongmin Kim

ReconResNet: Regularised Residual Learning for MR Image Reconstruction of Undersampled Cartesian and Radial Data

MRI is an inherently slow process, which leads to long scan time for high-resolution imaging. The speed of acquisition can be increased by ignoring parts of the data (undersampling). Consequently, this leads to the degradation of image…

Image and Video Processing · Electrical Eng. & Systems 2022-02-22 Soumick Chatterjee , Mario Breitkopf , Chompunuch Sarasaen , Hadya Yassin , Georg Rose , Andreas Nürnberger , Oliver Speck

Random Sequential Renormalization of Networks I: Application to Critical Trees

We introduce the concept of Random Sequential Renormalization (RSR) for arbitrary networks. RSR is a graph renormalization procedure that locally aggregates nodes to produce a coarse grained network. It is analogous to the (quasi-)parallel…

Statistical Mechanics · Physics 2011-03-24 Golnoosh Bizhani , Vishal Sood , Maya Paczuski , Peter Grassberger