English
Related papers

Related papers: Speed Limits for Deep Learning

200 papers

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven.…

Machine Learning · Computer Science 2021-07-22 Wei Huang , Weitao Du , Richard Yi Da Xu

In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at…

Machine Learning · Computer Science 2020-10-29 Stanislav Fort , Gintare Karolina Dziugaite , Mansheej Paul , Sepideh Kharaghani , Daniel M. Roy , Surya Ganguli

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent. However, despite its importance, the super-quadratic runtime of kernel methods limits the use of…

Machine Learning · Computer Science 2021-07-28 Amir Zandieh

Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave…

Machine Learning · Computer Science 2025-07-08 Konstantin Nikolaou , Sven Krippendorf , Samuel Tovey , Christian Holm

The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we…

Machine Learning · Computer Science 2022-06-29 Jongmin Lee , Joo Young Choi , Ernest K. Ryu , Albert No

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a…

Machine Learning · Computer Science 2020-02-11 Arthur Jacot , Franck Gabriel , Clément Hongler

Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such…

Machine Learning · Statistics 2022-09-26 Inbar Seroussi , Gadi Naveh , Zohar Ringel

Biological systems have to build models from their sensory data that allow them to efficiently process previously unseen inputs. Here, we study a neural network learning a linearly separable rule using examples provided by a teacher. We…

Statistical Mechanics · Physics 2017-11-22 Sebastian Goldt , Udo Seifert

Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the…

Machine Learning · Computer Science 2020-12-30 Mario Geiger , Stefano Spigler , Arthur Jacot , Matthieu Wyart

Virtually every organism gathers information about its noisy environment and builds models from that data, mostly using neural networks. Here, we use stochastic thermodynamics to analyse the learning of a classification rule by a neural…

Statistical Mechanics · Physics 2017-01-31 Sebastian Goldt , Udo Seifert

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are…

Machine Learning · Statistics 2024-09-17 Namjoon Suh , Guang Cheng

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at…

Machine Learning · Statistics 2023-05-23 Simone Bombari , Mohammad Hossein Amani , Marco Mondelli

Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019)…

Machine Learning · Statistics 2022-05-26 Soufiane Hayou , Arnaud Doucet , Judith Rousseau

We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with…

Machine Learning · Computer Science 2022-10-19 Jingwei Zhang , Xunpeng Huang , Jincheng Yu

Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as…

Machine Learning · Computer Science 2026-02-03 Artur Back de Luca , George Giapitzakis , Kimon Fountoulakis

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks…

Machine Learning · Computer Science 2021-12-09 Amir Zandieh , Insu Han , Haim Avron , Neta Shoham , Chaewon Kim , Jinwoo Shin

The rapid growth of deep neural networks (DNNs) has brought increasing attention to their energy use during training and inference. Here, we establish the thermodynamic bounds on energy consumption in quasi-static analog DNNs by mapping…

Statistical Mechanics · Physics 2025-12-09 Alexei V. Tkachenko

We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its…

Machine Learning · Computer Science 2020-10-28 Clare Lyle , Lisa Schut , Binxin Ru , Yarin Gal , Mark van der Wilk

Recently, neural tangent kernel (NTK) has been used to explain the dynamics of learning parameters of neural networks, at the large width limit. Quantitative analyses of NTK give rise to network widths that are often impractical and incur…

Machine Learning · Computer Science 2022-10-11 Nir Ailon , Supratim Shit
‹ Prev 1 2 3 10 Next ›