Related papers: Exponential Moving Average Normalization for Self-…

Switch EMA: A Free Lunch for Better Flatness and Sharpness

Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing…

Machine Learning · Computer Science 2024-10-08 Siyuan Li , Zicheng Liu , Juanxi Tian , Ge Wang , Zedong Wang , Weiyang Jin , Di Wu , Cheng Tan , Tao Lin , Yang Liu , Baigui Sun , Stan Z. Li

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like…

Computer Vision and Pattern Recognition · Computer Science 2020-06-17 Junjie Yan , Ruosi Wan , Xiangyu Zhang , Wei Zhang , Yichen Wei , Jian Sun

Extended Batch Normalization

Batch normalization (BN) has become a standard technique for training the modern deep networks. However, its effectiveness diminishes when the batch size becomes smaller, since the batch statistics estimation becomes inaccurate. That…

Computer Vision and Pattern Recognition · Computer Science 2020-03-13 Chunjie Luo , Jianfeng Zhan , Lei Wang , Wanling Gao

Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of the training loss. Previous works indicate that the noise is…

Machine Learning · Computer Science 2019-09-19 Senwei Liang , Zhongzhan Huang , Mingfu Liang , Haizhao Yang

EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes

Stochasticity in language model fine-tuning, often caused by the small batch sizes typically used in this regime, can destabilize training by introducing large oscillations in generation quality. A popular approach to mitigating this…

Machine Learning · Computer Science 2025-08-04 Adam Block , Cyril Zhang

The Unusual Effectiveness of Averaging in GAN Training

We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known…

Machine Learning · Statistics 2019-02-27 Yasin Yazıcı , Chuan-Sheng Foo , Stefan Winkler , Kim-Hui Yap , Georgios Piliouras , Vijay Chandrasekhar

Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic…

Machine Learning · Computer Science 2023-10-16 Yilin Lyu , Liyuan Wang , Xingxing Zhang , Zicheng Sun , Hang Su , Jun Zhu , Liping Jing

Kaizen: Continuously improving teacher using Exponential Moving Average for semi-supervised speech recognition

In this paper, we introduce the Kaizen framework that uses a continuously improving teacher to generate pseudo-labels for semi-supervised speech recognition (ASR). The proposed approach uses a teacher model which is updated as the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-10-28 Vimal Manohar , Tatiana Likhomanenko , Qiantong Xu , Wei-Ning Hsu , Ronan Collobert , Yatharth Saraf , Geoffrey Zweig , Abdelrahman Mohamed

Double Forward Propagation for Memorized Batch Normalization

Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several…

Machine Learning · Computer Science 2020-10-13 Yong Guo , Qingyao Wu , Chaorui Deng , Jian Chen , Mingkui Tan

Switching Temporary Teachers for Semi-Supervised Semantic Segmentation

The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Jaemin Na , Jung-Woo Ha , Hyung Jin Chang , Dongyoon Han , Wonjun Hwang

Supervised Batch Normalization

Batch Normalization (BN), a widely-used technique in neural networks, enhances generalization and expedites training by normalizing each mini-batch to the same mean and variance. However, its effectiveness diminishes when confronted with…

Machine Learning · Computer Science 2024-05-28 Bilal Faye , Mustapha Lebbah , Hanane Azzag

Exponential weight averaging as damped harmonic motion

The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with…

Machine Learning · Computer Science 2023-10-24 Jonathan Patsenker , Henry Li , Yuval Kluger

Exemplar Normalization for Learning Deep Representation

Normalization techniques are important in different advanced neural networks and different tasks. This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn…

Computer Vision and Pattern Recognition · Computer Science 2020-03-23 Ruimao Zhang , Zhanglin Peng , Lingyun Wu , Zhen Li , Ping Luo

Understanding Batch Normalization

Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet,…

Machine Learning · Computer Science 2018-12-03 Johan Bjorck , Carla Gomes , Bart Selman , Kilian Q. Weinberger

Continual Normalization: Rethinking Batch Normalization for Online Continual Learning

Existing continual learning methods use Batch Normalization (BN) to facilitate training and improve generalization across tasks. However, the non-i.i.d and non-stationary nature of continual learning data, especially in the online setting,…

Machine Learning · Computer Science 2022-03-31 Quang Pham , Chenghao Liu , Steven Hoi

EvalNorm: Estimating Batch Normalization Statistics for Evaluation

Batch normalization (BN) has been very effective for deep learning and is widely used. However, when training with small minibatches, models using BN exhibit a significant degradation in performance. In this paper we study this peculiar…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Saurabh Singh , Abhinav Shrivastava

Generalized Batch Normalization: Towards Accelerating Deep Neural Networks

Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In…

Machine Learning · Computer Science 2018-12-11 Xiaoyong Yuan , Zheng Feng , Matthew Norton , Xiaolin Li

Diminishing Batch Normalization

In this paper, we propose a generalization of the Batch Normalization (BN) algorithm, diminishing batch normalization (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the…

Machine Learning · Computer Science 2019-02-20 Yintai Ma , Diego Klabjan

Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

Weight averaging of Stochastic Gradient Descent (SGD) iterates is a popular method for training deep learning models. While it is often used as part of complex training pipelines to improve generalization or serve as a `teacher' model,…

Machine Learning · Computer Science 2024-12-02 Daniel Morales-Brotons , Thijs Vogels , Hadrien Hendrikx

Mode Normalization

Normalization methods are a central building block in the deep learning toolbox. They accelerate and stabilize training, while decreasing the dependence on manually tuned learning rate schedules. When learning from multi-modal…

Machine Learning · Computer Science 2018-10-15 Lucas Deecke , Iain Murray , Hakan Bilen