Related papers: Implicit Regularization and Convergence for Weight…

Weight Compander: A Simple Weight Reparameterization for Regularization

Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing…

Machine Learning · Computer Science 2023-06-30 Rinor Cakaj , Jens Mehnert , Bin Yang

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning…

Machine Learning · Computer Science 2016-06-07 Tim Salimans , Diederik P. Kingma

Projection Based Weight Normalization for Deep Neural Networks

Optimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to…

Machine Learning · Computer Science 2017-10-09 Lei Huang , Xianglong Liu , Bo Lang , Bo Li

Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this…

Machine Learning · Computer Science 2022-06-22 Ziquan Liu , Yufei Cui , Jia Wan , Yu Mao , Antoni B. Chan

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However,…

Machine Learning · Computer Science 2020-06-17 Jiacheng Sun , Xiangyong Cao , Hanwen Liang , Weiran Huang , Zewei Chen , Zhenguo Li

Robust Implicit Regularization via Weight Normalization

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

PathProx: A Proximal Gradient Algorithm for Weight Decay Regularized Deep Neural Networks

Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional…

Machine Learning · Computer Science 2023-07-07 Liu Yang , Jifan Zhang , Joseph Shenouda , Dimitris Papailiopoulos , Kangwook Lee , Robert D. Nowak

On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized…

Machine Learning · Computer Science 2025-10-02 Yudong Wei , Liang Zhang , Bingcong Li , Niao He

Minimum weight norm models do not always generalize well for over-parameterized problems

This work is substituted by the paper in arXiv:2011.14066. Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best…

Machine Learning · Statistics 2020-12-02 Vatsal Shah , Anastasios Kyrillidis , Sujay Sanghavi

Explicit regularization and implicit bias in deep network classifiers trained with the square loss

Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks. We provide here a theoretical justification based on analysis of the associated gradient flow. We show that convergence to a…

Machine Learning · Computer Science 2021-01-05 Tomaso Poggio , Qianli Liao

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a…

Machine Learning · Computer Science 2021-08-26 Difan Zou , Yuan Cao , Yuanzhi Li , Quanquan Gu

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded…

Computer Vision and Pattern Recognition · Computer Science 2017-10-10 Igor Gitman , Boris Ginsburg

Understanding Weight Similarity of Neural Networks via Chain Normalization Rule and Hypothesis-Training-Testing

We present a weight similarity measure method that can quantify the weight similarity of non-convex neural networks. To understand the weight similarity of different trained models, we propose to extract the feature representation from the…

Machine Learning · Computer Science 2022-08-10 Guangcong Wang , Guangrun Wang , Wenqi Liang , Jianhuang Lai

Function Norms and Regularization in Deep Networks

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of…

Machine Learning · Computer Science 2018-07-02 Amal Rannen Triki , Maxim Berman , Matthew B. Blaschko

Explicit Regularization via Regularizer Mirror Descent

Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization,…

Machine Learning · Computer Science 2022-02-23 Navid Azizan , Sahin Lale , Babak Hassibi

Weight Conditioning for Smooth Optimization of Neural Networks

In this article, we introduce a novel normalization technique for neural network weight matrices, which we term weight conditioning. This approach aims to narrow the gap between the smallest and largest singular values of the weight…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Hemanth Saratchandran , Thomas X. Wang , Simon Lucey

On the Convex Geometry of Weighted Nuclear Norm Minimization

Low-rank matrix approximation, which aims to construct a low-rank matrix from an observation, has received much attention recently. An efficient method to solve this problem is to convert the problem of rank minimization into a nuclear norm…

Information Theory · Computer Science 2016-09-21 Seyedroohollah Hosseini

Understanding the Disharmony between Weight Normalization Family and Weight Decay: $\epsilon-$shifted $L_2$ Regularizer

The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight…

Machine Learning · Computer Science 2019-11-15 Li Xiang , Chen Shuo , Xia Yan , Yang Jian

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Yuxiang Liu , Jidong Ge , Chuanyi Li , Jie Gui

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization…

Machine Learning · Computer Science 2020-06-15 Yonatan Dukler , Quanquan Gu , Guido Montúfar