English
Related papers

Related papers: Implicit Regularization and Convergence for Weight…

200 papers

Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing…

Machine Learning · Computer Science 2023-06-30 Rinor Cakaj , Jens Mehnert , Bin Yang

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning…

Machine Learning · Computer Science 2016-06-07 Tim Salimans , Diederik P. Kingma

Optimizing deep neural networks (DNNs) often suffers from the ill-conditioned problem. We observe that the scaling-based weight space symmetry property in rectified nonlinear network will cause this negative effect. Therefore, we propose to…

Machine Learning · Computer Science 2017-10-09 Lei Huang , Xianglong Liu , Bo Lang , Bo Li

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this…

Machine Learning · Computer Science 2022-06-22 Ziquan Liu , Yufei Cui , Jia Wan , Yu Mao , Antoni B. Chan

In recent years, a variety of normalization methods have been proposed to help train neural networks, such as batch normalization (BN), layer normalization (LN), weight normalization (WN), group normalization (GN), etc. However,…

Machine Learning · Computer Science 2020-06-17 Jiacheng Sun , Xiangyong Cao , Hanwen Liang , Weiran Huang , Zewei Chen , Zhenguo Li

Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line…

Machine Learning · Computer Science 2024-09-18 Hung-Hsu Chou , Holger Rauhut , Rachel Ward

Weight decay is one of the most widely used forms of regularization in deep learning, and has been shown to improve generalization and robustness. The optimization objective driving weight decay is a sum of losses plus a term proportional…

Machine Learning · Computer Science 2023-07-07 Liu Yang , Jifan Zhang , Joseph Shenouda , Dimitris Papailiopoulos , Kangwook Lee , Robert D. Nowak

While normalization techniques are widely used in deep learning, their theoretical understanding remains relatively limited. In this work, we establish the benefits of (generalized) weight normalization (WN) applied to the overparameterized…

Machine Learning · Computer Science 2025-10-02 Yudong Wei , Liang Zhang , Bingcong Li , Niao He

This work is substituted by the paper in arXiv:2011.14066. Stochastic gradient descent is the de facto algorithm for training deep neural networks (DNNs). Despite its popularity, it still requires fine tuning in order to achieve its best…

Machine Learning · Statistics 2020-12-02 Vatsal Shah , Anastasios Kyrillidis , Sujay Sanghavi

Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks. We provide here a theoretical justification based on analysis of the associated gradient flow. We show that convergence to a…

Machine Learning · Computer Science 2021-01-05 Tomaso Poggio , Qianli Liao

Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a…

Machine Learning · Computer Science 2021-08-26 Difan Zou , Yuan Cao , Yuanzhi Li , Quanquan Gu

Batch normalization (BN) has become a de facto standard for training deep convolutional networks. However, BN accounts for a significant fraction of training run-time and is difficult to accelerate, since it is a memory-bandwidth bounded…

Computer Vision and Pattern Recognition · Computer Science 2017-10-10 Igor Gitman , Boris Ginsburg

We present a weight similarity measure method that can quantify the weight similarity of non-convex neural networks. To understand the weight similarity of different trained models, we propose to extract the feature representation from the…

Machine Learning · Computer Science 2022-08-10 Guangcong Wang , Guangrun Wang , Wenqi Liang , Jianhuang Lai

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of…

Machine Learning · Computer Science 2018-07-02 Amal Rannen Triki , Maxim Berman , Matthew B. Blaschko

Despite perfectly interpolating the training data, deep neural networks (DNNs) can often generalize fairly well, in part due to the "implicit regularization" induced by the learning algorithm. Nonetheless, various forms of regularization,…

Machine Learning · Computer Science 2022-02-23 Navid Azizan , Sahin Lale , Babak Hassibi

In this article, we introduce a novel normalization technique for neural network weight matrices, which we term weight conditioning. This approach aims to narrow the gap between the smallest and largest singular values of the weight…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Hemanth Saratchandran , Thomas X. Wang , Simon Lucey

Low-rank matrix approximation, which aims to construct a low-rank matrix from an observation, has received much attention recently. An efficient method to solve this problem is to convert the problem of rank minimization into a nuclear norm…

Information Theory · Computer Science 2016-09-21 Seyedroohollah Hosseini

The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight…

Machine Learning · Computer Science 2019-11-15 Li Xiang , Chen Shuo , Xia Yan , Yang Jian

Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Yuxiang Liu , Jidong Ge , Chuanyi Li , Jie Gui

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization…

Machine Learning · Computer Science 2020-06-15 Yonatan Dukler , Quanquan Gu , Guido Montúfar
‹ Prev 1 2 3 10 Next ›