Related papers: Trainable Weight Averaging: Accelerating Training …

Averaging Weights Leads to Wider Optima and Better Generalization

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD,…

Machine Learning · Computer Science 2019-02-26 Pavel Izmailov , Dmitrii Podoprikhin , Timur Garipov , Dmitry Vetrov , Andrew Gordon Wilson

Adaptive Stochastic Weight Averaging

Ensemble models often improve generalization performances in challenging tasks. Yet, traditional techniques based on prediction averaging incur three well-known disadvantages: the computational overhead of training multiple models,…

Machine Learning · Computer Science 2024-06-28 Caglar Demir , Arnab Sharma , Axel-Cyrille Ngonga Ngomo

Hierarchical Weight Averaging for Deep Neural Networks

Despite the simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs). Among various attempts to improve SGD, weight averaging (WA), which averages the weights of multiple models,…

Machine Learning · Computer Science 2023-04-25 Xiaozhe Gu , Zixun Zhang , Yuncheng Jiang , Tao Luo , Ruimao Zhang , Shuguang Cui , Zhen Li

Adversarial Training with Stochastic Weight Average

Adversarial training deep neural networks often experience serious overfitting problem. Recently, it is explained that the overfitting happens because the sample complexity of training data is insufficient to generalize robustness. In…

Machine Learning · Computer Science 2020-09-23 Joong-Won Hwang , Youngwan Lee , Sungchan Oh , Yuseok Bae

SeWA: Selective Weight Average via Probabilistic Masking

Weight averaging has become a standard technique for enhancing model performance. However, methods such as Stochastic Weight Averaging (SWA) and Latest Weight Averaging (LAWA) often require manually designed procedures to sample from the…

Machine Learning · Computer Science 2025-02-17 Peng Wang , Shengchao Hu , Zerui Tao , Guoxia Wang , Dianhai Yu , Li Shen , Quan Zheng , Dacheng Tao

SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks

Designing a deep neural network (DNN) with good generalization capability is a complex process especially when the weights are severely quantized. Model averaging is a promising approach for achieving the good generalization capability of…

Machine Learning · Computer Science 2020-02-04 Sungho Shin , Yoonho Boo , Wonyong Sung

Stochastic Weight Averaging Revisited

Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective,…

Machine Learning · Computer Science 2022-09-20 Hao Guo , Jiyong Jin , Bin Liu

Diverse Weight Averaging for Out-of-Distribution Generalization

Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA)…

Computer Vision and Pattern Recognition · Computer Science 2023-01-30 Alexandre Ramé , Matthieu Kirchmeyer , Thibaud Rahier , Alain Rakotomamonjy , Patrick Gallinari , Matthieu Cord

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple…

Machine Learning · Computer Science 2020-01-09 Vipul Gupta , Santiago Akle Serrano , Dennis DeCoste

IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks

Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Zitong Huang , Ze Chen , Bowen Dong , Chaoqi Liang , Erjin Zhou , Wangmeng Zuo

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of…

Machine Learning · Computer Science 2024-05-29 Louis Fournier , Adel Nabli , Masih Aminbeidokhti , Marco Pedersoli , Eugene Belilovsky , Edouard Oyallon

A Unified Analysis for Finite Weight Averaging

Averaging iterations of Stochastic Gradient Descent (SGD) have achieved empirical success in training deep learning models, such as Stochastic Weight Averaging (SWA), Exponential Moving Average (EMA), and LAtest Weight Averaging (LAWA).…

Machine Learning · Computer Science 2024-11-21 Peng Wang , Li Shen , Zerui Tao , Yan Sun , Guodong Zheng , Dacheng Tao

There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average

Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters. To understand consistency…

Machine Learning · Computer Science 2019-02-22 Ben Athiwaratkun , Marc Finzi , Pavel Izmailov , Andrew Gordon Wilson

Sample Weight Averaging for Stable Prediction

The challenge of Out-of-Distribution (OOD) generalization poses a foundational concern for the application of machine learning algorithms to risk-sensitive areas. Inspired by traditional importance weighting and propensity weighting…

Machine Learning · Computer Science 2025-02-12 Han Yu , Yue He , Renzhe Xu , Dongbai Li , Jiayin Zhang , Wenchao Zou , Peng Cui

Weight Prediction Boosts the Convergence of AdamW

In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to…

Machine Learning · Computer Science 2023-08-09 Lei Guan

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised classification setting, we investigate whether…

Machine Learning · Computer Science 2025-10-30 Florian A. Hölzl , Daniel Rueckert , Georgios Kaissis

Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework

Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a…

Machine Learning · Computer Science 2019-11-27 Zhenmao Li , Yichao Wu , Ken Chen , Yudong Wu , Shunfeng Zhou , Jiaheng Liu , Junjie Yan

Weighted Training for Cross-Task Learning

In this paper, we introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning based on minimizing a representation-based task distance between the source and target tasks. We show that TAWT is easy…

Machine Learning · Computer Science 2022-03-02 Shuxiao Chen , Koby Crammer , Hangfeng He , Dan Roth , Weijie J. Su

LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training

Training deep neural networks with noise and data heterogeneity is a major challenge. We introduce Lightweight Learnable Adaptive Weighting (LiLAW), a method that dynamically adjusts the loss weight of each training sample based on its…

Machine Learning · Computer Science 2026-05-14 Abhishek Moturu , Muhammad Muzammil , Anna Goldenberg , Babak Taati

When, Where and Why to Average Weights?

Averaging checkpoints along the training trajectory is a simple yet powerful approach to improve the generalization performance of Machine Learning models and reduce training time. Motivated by these potential gains, and in an effort to…

Machine Learning · Computer Science 2025-11-25 Niccolò Ajroldi , Antonio Orvieto , Jonas Geiping