Related papers: Speed Limits for Deep Learning

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven.…

Machine Learning · Computer Science 2021-07-22 Wei Huang , Weitao Du , Richard Yi Da Xu

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight expansion of the network at…

Machine Learning · Computer Science 2020-10-29 Stanislav Fort , Gintare Karolina Dziugaite , Mansheej Paul , Sepideh Kharaghani , Daniel M. Roy , Surya Ganguli

Approximation and Gradient Descent Training with Neural Networks

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training…

Machine Learning · Computer Science 2024-05-21 G. Welper

Learning with Neural Tangent Kernels in Near Input Sparsity Time

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely wide neural nets trained under least squares loss by gradient descent. However, despite its importance, the super-quadratic runtime of kernel methods limits the use of…

Machine Learning · Computer Science 2021-07-28 Amir Zandieh

Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens

Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave…

Machine Learning · Computer Science 2025-07-08 Konstantin Nikolaou , Sven Krippendorf , Samuel Tovey , Christian Holm

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

The tremendous recent progress in analyzing the training dynamics of overparameterized neural networks has primarily focused on wide networks and therefore does not sufficiently address the role of depth in deep learning. In this work, we…

Machine Learning · Computer Science 2022-06-29 Jongmin Lee , Joo Young Choi , Ernest K. Ryu , Albert No

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a…

Machine Learning · Computer Science 2020-02-11 Arthur Jacot , Franck Gabriel , Clément Hongler

Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs

Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such…

Machine Learning · Statistics 2022-09-26 Inbar Seroussi , Gadi Naveh , Zohar Ringel

Thermodynamic efficiency of learning a rule in neural networks

Biological systems have to build models from their sensory data that allow them to efficiently process previously unseen inputs. Here, we study a neural network learning a linearly separable rule using examples provided by a teacher. We…

Statistical Mechanics · Physics 2017-11-22 Sebastian Goldt , Udo Seifert

Disentangling feature and lazy training in deep neural networks

Two distinct limits for deep learning have been derived as the network width $h\rightarrow \infty$, depending on how the weights of the last layer scale with $h$. In the Neural Tangent Kernel (NTK) limit, the dynamics becomes linear in the…

Machine Learning · Computer Science 2020-12-30 Mario Geiger , Stefano Spigler , Arthur Jacot , Matthieu Wyart

Stochastic Thermodynamics of Learning

Virtually every organism gathers information about its noisy environment and builds models from that data, mostly using neural networks. Here, we use stochastic thermodynamics to analyse the learning of a classification rule by a neural…

Statistical Mechanics · Physics 2017-01-31 Sebastian Goldt , Udo Seifert

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are…

Machine Learning · Statistics 2024-09-17 Namjoon Suh , Guang Cheng

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at…

Machine Learning · Statistics 2023-05-23 Simone Bombari , Mohammad Hossein Amani , Marco Mondelli

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019)…

Machine Learning · Statistics 2022-05-26 Soufiane Hayou , Arnaud Doucet , Judith Rousseau

Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with…

Machine Learning · Computer Science 2022-10-19 Jingwei Zhang , Xunpeng Huang , Jincheng Yu

Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks

Neural networks are known for their ability to approximate smooth functions, yet they fail to generalize perfectly to unseen inputs when trained on discrete operations. Such operations lie at the heart of algorithmic tasks such as…

Machine Learning · Computer Science 2026-02-03 Artur Back de Luca , George Giapitzakis , Kimon Fountoulakis

Scaling Neural Tangent Kernels via Sketching and Random Features

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks…

Machine Learning · Computer Science 2021-12-09 Amir Zandieh , Insu Han , Haim Avron , Neta Shoham , Chaewon Kim , Jinwoo Shin

Thermodynamic bounds on energy use in quasi-static Deep Neural Networks

The rapid growth of deep neural networks (DNNs) has brought increasing attention to their energy use during training and inference. Here, we establish the thermodynamic bounds on energy consumption in quasi-static analog DNNs by mapping…

Statistical Mechanics · Physics 2025-12-09 Alexei V. Tkachenko

A Bayesian Perspective on Training Speed and Model Selection

We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its…

Machine Learning · Computer Science 2020-10-28 Clare Lyle , Lisa Schut , Binxin Ru , Yarin Gal , Mark van der Wilk

Efficient NTK using Dimensionality Reduction

Recently, neural tangent kernel (NTK) has been used to explain the dynamics of learning parameters of neural networks, at the large width limit. Quantitative analyses of NTK give rise to network widths that are often impractical and incur…

Machine Learning · Computer Science 2022-10-11 Nir Ailon , Supratim Shit