English
Related papers

Related papers: Deep Linear Network Training Dynamics from Random …

200 papers

We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on…

Machine Learning · Statistics 2017-10-11 Madhu S. Advani , Andrew M. Saxe

We analyze recurrent neural networks with diagonal hidden-to-hidden weight matrices, trained with gradient descent in the supervised learning setting, and prove that gradient descent can achieve optimality \emph{without} massive…

Machine Learning · Computer Science 2024-10-11 Semih Cayci , Atilla Eryilmaz

Neural networks trained via gradient descent with random initialization and without any regularization enjoy good generalization performance in practice despite being highly overparametrized. A promising direction to explain this phenomenon…

Machine Learning · Computer Science 2022-05-17 Hancheng Min , Salma Tarmoun , Rene Vidal , Enrique Mallada

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we…

Finding parameters in a deep neural network (NN) that fit training data is a nonconvex optimization problem, but a basic first-order optimization method (gradient descent) finds a global optimizer with perfect fit (zero-loss) in many…

Machine Learning · Computer Science 2025-03-07 Zhiyan Ding , Shi Chen , Qin Li , Stephen Wright

Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep…

Neural and Evolutionary Computing · Computer Science 2014-02-20 Andrew M. Saxe , James L. McClelland , Surya Ganguli

A fairly comprehensive analysis is presented for the gradient descent dynamics for training two-layer neural network models in the situation when the parameters in both layers are updated. General initialization schemes as well as general…

Machine Learning · Computer Science 2020-02-27 Weinan E , Chao Ma , Lei Wu

In this effort we propose a novel approach for reconstructing multivariate functions from training data, by identifying both a suitable network architecture and an initialization using polynomial-based approximations. Training deep neural…

Machine Learning · Computer Science 2019-05-29 Joseph Daws , Clayton G. Webster

The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the…

Machine Learning · Statistics 2023-12-11 Blake Bordelon , Lorenzo Noci , Mufan Bill Li , Boris Hanin , Cengiz Pehlevan

We study the high-dimensional training dynamics of a shallow neural network with quadratic activation in a teacher-student setup. We focus on the extensive-width regime, where the teacher and student network widths scale proportionally with…

Optimization and Control · Mathematics 2026-01-16 Simon Martin , Giulio Biroli , Francis Bach

In this paper, we provide the first precise distributional characterization of gradient descent iterates for general multi-layer neural networks under the canonical single-index regression model, in the `finite-width proportional regime'…

Machine Learning · Computer Science 2025-05-09 Qiyang Han , Masaaki Imaizumi

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models…

Machine Learning · Computer Science 2021-12-21 Karl Hajjar , Lénaïc Chizat , Christophe Giraud

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the…

Machine Learning · Computer Science 2019-06-12 Difan Zou , Quanquan Gu

The skip-connections used in residual networks have become a standard architecture choice in deep learning due to the increased training stability and generalization performance with this architecture, although there has been limited…

Machine Learning · Computer Science 2019-10-08 Spencer Frei , Yuan Cao , Quanquan Gu

This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained…

Machine Learning · Computer Science 2022-12-01 Lénaïc Chizat , Maria Colombo , Xavier Fernández-Real , Alessio Figalli

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of…

Machine Learning · Computer Science 2020-09-09 Yeonjong Shin

Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this…

Machine Learning · Statistics 2019-01-08 Andrew K. Lampinen , Surya Ganguli

A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics…

One of the mysteries in the success of neural networks is randomly initialized first order methods like gradient descent can achieve zero training loss even though the objective function is non-convex and non-smooth. This paper demystifies…

Machine Learning · Computer Science 2019-02-06 Simon S. Du , Xiyu Zhai , Barnabas Poczos , Aarti Singh
‹ Prev 1 2 3 10 Next ›