Related papers: Speed Limits for Deep Learning

Can Stationary Distributions of Scale-Invariant Neural Networks Be Described by the Thermodynamics of an Ideal Gas?

Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the…

Machine Learning · Computer Science 2026-05-15 Ildus Sadrtdinov , Ekaterina Lobacheva , Ivan Klimov , Mikhail Burtsev , Mikhail I. Katsnelson , Dmitry Vetrov

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result…

Machine Learning · Computer Science 2022-07-22 Mariia Seleznova , Gitta Kutyniok

E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

Convolutional neural networks (CNNs) have been increasingly deployed to edge devices. Hence, many efforts have been made towards efficient CNN inference in resource-constrained platforms. This paper attempts to explore an orthogonal…

Machine Learning · Computer Science 2019-12-09 Yue Wang , Ziyu Jiang , Xiaohan Chen , Pengfei Xu , Yang Zhao , Yingyan Lin , Zhangyang Wang

Tuning Frequency Bias in Neural Network Training with Nonuniform Data

Small generalization errors of over-parameterized neural networks (NNs) can be partially explained by the frequency biasing phenomenon, where gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency…

Machine Learning · Computer Science 2022-09-27 Annan Yu , Yunan Yang , Alex Townsend

Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models

The rapid growth of machine learning has spurred legislative initiatives such as ``the Right to be Forgotten,'' allowing users to request data removal. In response, ``machine unlearning'' proposes the selective removal of unwanted data…

Machine Learning · Computer Science 2023-12-25 Guihong Li , Hsiang Hsu , Chun-Fu Chen , Radu Marculescu

Finding trainable sparse networks through Neural Tangent Transfer

Deep neural networks have dramatically transformed machine learning, but their memory and energy demands are substantial. The requirements of real biological neural networks are rather modest in comparison, and one feature that might…

Machine Learning · Computer Science 2020-07-27 Tianlin Liu , Friedemann Zenke

When and why PINNs fail to train: A neural tangent kernel perspective

Physics-informed neural networks (PINNs) have lately received great attention thanks to their flexibility in tackling a wide range of forward and inverse problems involving partial differential equations. However, despite their noticeable…

Machine Learning · Computer Science 2020-07-30 Sifan Wang , Xinling Yu , Paris Perdikaris

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

We analyze the convergence of the averaged stochastic gradient descent for overparameterized two-layer neural networks for regression problems. It was recently found that a neural tangent kernel (NTK) plays an important role in showing the…

Machine Learning · Statistics 2021-06-14 Atsushi Nitanda , Taiji Suzuki

Cyclical Learning Rates for Training Neural Networks

It is known that the learning rate is the most important hyper-parameter to tune for training deep neural networks. This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates…

Computer Vision and Pattern Recognition · Computer Science 2017-04-05 Leslie N. Smith

Limitations of the NTK for Understanding Generalization in Deep Learning

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks. In this work, we study NTKs through the lens of scaling laws, and…

Machine Learning · Computer Science 2022-06-22 Nikhil Vyas , Yamini Bansal , Preetum Nakkiran

On the Empirical Neural Tangent Kernel of Standard Finite-Width Convolutional Neural Network Architectures

The Neural Tangent Kernel (NTK) is an important milestone in the ongoing effort to build a theory for deep learning. Its prediction that sufficiently wide neural networks behave as kernel methods, or equivalently as random feature models,…

Machine Learning · Computer Science 2020-06-25 Maxim Samarin , Volker Roth , David Belius

Fast Finite Width Neural Tangent Kernel

The Neural Tangent Kernel (NTK), defined as $\Theta_\theta^f(x_1, x_2) = \left[\partial f(\theta, x_1)\big/\partial \theta\right] \left[\partial f(\theta, x_2)\big/\partial \theta\right]^T$ where $\left[\partial f(\theta,…

Machine Learning · Computer Science 2022-06-20 Roman Novak , Jascha Sohl-Dickstein , Samuel S. Schoenholz

Optimizing the energy consumption of spiking neural networks for neuromorphic applications

In the last few years, spiking neural networks have been demonstrated to perform on par with regular convolutional neural networks. Several works have proposed methods to convert a pre-trained CNN to a Spiking CNN without a significant…

Neural and Evolutionary Computing · Computer Science 2020-05-05 Martino Sorbaro , Qian Liu , Massimo Bortone , Sadique Sheik

The Neural Tangent Link Between CNN Denoisers and Non-Local Filters

Convolutional Neural Networks (CNNs) are now a well-established tool for solving computational imaging problems. Modern CNN-based algorithms obtain state-of-the-art performance in diverse image restoration problems. Furthermore, it has been…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Julián Tachella , Junqi Tang , Mike Davies

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks

The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show…

Machine Learning · Computer Science 2024-11-01 Andrea Bonfanti , Giuseppe Bruno , Cristina Cipriani

Better Training using Weight-Constrained Stochastic Dynamics

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of…

Machine Learning · Computer Science 2021-06-22 Benedict Leimkuhler , Tiffany Vlaar , Timothée Pouchon , Amos Storkey

Enhanced Convolutional Neural Tangent Kernels

Recent research shows that for training with $\ell_2$ loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity correspond to regression with respect to the CNN Gaussian Process…

Machine Learning · Computer Science 2019-11-05 Zhiyuan Li , Ruosong Wang , Dingli Yu , Simon S. Du , Wei Hu , Ruslan Salakhutdinov , Sanjeev Arora

Gradient Descent in Neural Networks as Sequential Learning in RKBS

The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor…

Machine Learning · Statistics 2023-02-02 Alistair Shilton , Sunil Gupta , Santu Rana , Svetha Venkatesh

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

The evolution of a deep neural network trained by the gradient descent can be described by its neural tangent kernel (NTK) as introduced in [20], where it was proven that in the infinite width limit the NTK converges to an explicit limiting…

Machine Learning · Computer Science 2019-09-19 Jiaoyang Huang , Horng-Tzer Yau

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn