Related papers: Optimization and Generalization Guarantees for Wei…

Revisiting Initialization of Neural Networks

The proper initialization of weights is crucial for the effective training and fast convergence of deep neural networks (DNNs). Prior work in this area has mostly focused on balancing the variance among weights per layer to maintain…

Machine Learning · Computer Science 2020-06-05 Maciej Skorski , Alessandro Temperoni , Martin Theobald

Function Norms and Regularization in Deep Networks

Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of…

Machine Learning · Computer Science 2018-07-02 Amal Rannen Triki , Maxim Berman , Matthew B. Blaschko

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization…

Machine Learning · Computer Science 2020-06-15 Yonatan Dukler , Quanquan Gu , Guido Montúfar

Minnorm training: an algorithm for training over-parameterized deep neural networks

In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a…

Machine Learning · Statistics 2018-06-22 Yamini Bansal , Madhu Advani , David D Cox , Andrew M Saxe

Convolutional Normalization: Improving Deep Convolutional Network Robustness and Training

Normalization techniques have become a basic component in modern convolutional neural networks (ConvNets). In particular, many recent works demonstrate that promoting the orthogonality of the weights helps train deep models and improve…

Computer Vision and Pattern Recognition · Computer Science 2022-01-05 Sheng Liu , Xiao Li , Yuexiang Zhai , Chong You , Zhihui Zhu , Carlos Fernandez-Granda , Qing Qu

On the asymptotic rate of convergence of Stochastic Newton algorithms and their Weighted Averaged versions

The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its…

Statistics Theory · Mathematics 2023-06-30 Claire Boyer , Antoine Godichon-Baggioni

Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this…

Machine Learning · Computer Science 2022-06-22 Ziquan Liu , Yufei Cui , Jia Wan , Yu Mao , Antoni B. Chan

Norm matters: efficient and accurate normalization schemes in deep networks

Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications. However, the reasons behind its merits remained unanswered, with several…

Machine Learning · Statistics 2019-02-08 Elad Hoffer , Ron Banner , Itay Golan , Daniel Soudry

Globally Optimal Training of Neural Networks with Threshold Activation Functions

Threshold activation functions are highly preferable in neural networks due to their efficiency in hardware implementations. Moreover, their mode of operation is more interpretable and resembles that of biological neurons. However,…

Machine Learning · Computer Science 2023-03-07 Tolga Ergen , Halil Ibrahim Gulluk , Jonathan Lacotte , Mert Pilanci

Generalisation in fully-connected neural networks for time series forecasting

In this paper we study the generalization capabilities of fully-connected neural networks trained in the context of time series forecasting. Time series do not satisfy the typical assumption in statistical learning theory of the data being…

Machine Learning · Statistics 2019-07-30 Anastasia Borovykh , Cornelis W. Oosterlee , Sander M. Bohte

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning…

Machine Learning · Computer Science 2016-06-07 Tim Salimans , Diederik P. Kingma

Fine-grained Optimization of Deep Neural Networks

In recent studies, several asymptotic upper bounds on generalization errors on deep neural networks (DNNs) are theoretically derived. These bounds are functions of several norms of weights of the DNNs, such as the Frobenius and spectral…

Machine Learning · Computer Science 2019-05-23 Mete Ozay

Recurrence of Optimum for Training Weight and Activation Quantized Networks

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for…

Machine Learning · Computer Science 2021-05-25 Ziang Long , Penghang Yin , Jack Xin

FixNorm: Dissecting Weight Decay for Training Deep Neural Networks

Weight decay is a widely used technique for training Deep Neural Networks(DNN). It greatly affects generalization performance but the underlying mechanisms are not fully understood. Recent works show that for layers followed by…

Machine Learning · Computer Science 2021-03-30 Yucong Zhou , Yunxiao Sun , Zhao Zhong

Theory III: Dynamics and Generalization in Deep Networks

The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that…

Machine Learning · Computer Science 2020-04-14 Andrzej Banburski , Qianli Liao , Brando Miranda , Lorenzo Rosasco , Fernanda De La Torre , Jack Hidary , Tomaso Poggio

Weight Conditioning for Smooth Optimization of Neural Networks

In this article, we introduce a novel normalization technique for neural network weight matrices, which we term weight conditioning. This approach aims to narrow the gap between the smallest and largest singular values of the weight…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Hemanth Saratchandran , Thomas X. Wang , Simon Lucey

Non-Convex Optimization with Spectral Radius Regularization

We develop regularization methods to find flat minima while training deep neural networks. These minima generalize better than sharp minima, yielding models outperforming baselines on real-world test data (which may be distributed…

Machine Learning · Computer Science 2025-07-04 Adam Sandler , Diego Klabjan , Yuan Luo

Better Training using Weight-Constrained Stochastic Dynamics

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of…

Machine Learning · Computer Science 2021-06-22 Benedict Leimkuhler , Tiffany Vlaar , Timothée Pouchon , Amos Storkey

Theoretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates,…

Machine Learning · Statistics 2022-10-19 Lakshmi Annamalai , Chetan Singh Thakur

Generalizable Adversarial Training via Spectral Normalization

Deep neural networks (DNNs) have set benchmarks on a wide array of supervised learning tasks. Trained DNNs, however, often lack robustness to minor adversarial perturbations to the input, which undermines their true practicality. Recent…

Machine Learning · Computer Science 2018-11-20 Farzan Farnia , Jesse M. Zhang , David Tse