Related papers: Speed Limits for Deep Learning

Feature Learning in Infinite-Width Neural Networks

As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK…

Machine Learning · Computer Science 2022-07-18 Greg Yang , Edward J. Hu

Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels

A quantum neural network (QNN) is a parameterized mapping efficiently implementable on near-term Noisy Intermediate-Scale Quantum (NISQ) computers. It can be used for supervised learning when combined with classical gradient-based…

Quantum Physics · Physics 2023-03-28 Xuchen You , Shouvanik Chakrabarti , Boyang Chen , Xiaodi Wu

New Insights into Graph Convolutional Networks using Neural Tangent Kernels

Graph Convolutional Networks (GCNs) have emerged as powerful tools for learning on network structured data. Although empirically successful, GCNs exhibit certain behaviour that has no rigorous explanation -- for instance, the performance of…

Machine Learning · Computer Science 2023-11-07 Mahalakshmi Sabanayagam , Pascal Esser , Debarghya Ghoshdastidar

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

Current Deep Learning approaches have been very successful using convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers. Three limitations of this approach are: 1) they are based on a simple…

Neural and Evolutionary Computing · Computer Science 2017-07-17 Thomas E. Potok , Catherine Schuman , Steven R. Young , Robert M. Patton , Federico Spedalieri , Jeremy Liu , Ke-Thia Yao , Garrett Rose , Gangotree Chakma

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

We analyze the generalization properties of two-layer neural networks in the neural tangent kernel (NTK) regime, trained with gradient descent (GD). For early stopped GD we derive fast rates of convergence that are known to be minimax…

Machine Learning · Statistics 2023-09-18 Mike Nguyen , Nicole Mücke

Frequency Bias in Neural Networks for Input of Non-Uniform Density

Recent works have partly attributed the generalization ability of over-parameterized neural networks to frequency bias -- networks trained with gradient descent on data drawn from a uniform distribution find a low frequency fit before high…

Machine Learning · Computer Science 2020-03-11 Ronen Basri , Meirav Galun , Amnon Geifman , David Jacobs , Yoni Kasten , Shira Kritchman

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

The training dynamics and generalization properties of neural networks (NN) can be precisely characterized in function space via the neural tangent kernel (NTK). Structural changes to the NTK during training reflect feature learning and…

Machine Learning · Statistics 2022-02-11 Haozhe Shan , Blake Bordelon

Towards a Phenomenological Understanding of Neural Networks: Data

A theory of neural networks (NNs) built upon collective variables would provide scientists with the tools to better understand the learning process at every stage. In this work, we introduce two such variables, the entropy and the trace of…

Machine Learning · Computer Science 2023-05-03 Samuel Tovey , Sven Krippendorf , Konstantin Nikolaou , Christian Holm

Assessing the Impact of Low Resolution Control Electronics on Quantum Neural Network Performance

Scaling quantum computers requires tight integration of cryogenic control electronics with quantum processors, where Digital-to-Analog Converters (DACs) face severe power and area constraints. We investigate quantum neural network (QNN)…

Quantum Physics · Physics 2026-02-04 Rupayan Bhattacharjee , Rohit Sarma Sarkar , Sergi Abadal , Carmen G. Almudever , Eduard Alarcon

The Recurrent Neural Tangent Kernel

The study of deep neural networks (DNNs) in the infinite-width limit, via the so-called neural tangent kernel (NTK) approach, has provided new insights into the dynamics of learning, generalization, and the impact of initialization. One key…

Machine Learning · Computer Science 2021-06-16 Sina Alemohammad , Zichao Wang , Randall Balestriero , Richard Baraniuk

NTK-DFL: Enhancing Decentralized Federated Learning in Heterogeneous Settings via Neural Tangent Kernel

Decentralized federated learning (DFL) is a collaborative machine learning framework for training a model across participants without a central server or raw data exchange. DFL faces challenges due to statistical heterogeneity, as…

Machine Learning · Computer Science 2025-06-16 Gabriel Thompson , Kai Yue , Chau-Wai Wong , Huaiyu Dai

Infinite-width limit of deep linear neural networks

This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained…

Machine Learning · Computer Science 2022-12-01 Lénaïc Chizat , Maria Colombo , Xavier Fernández-Real , Alessio Figalli

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that…

Machine Learning · Computer Science 2020-06-26 Wei Hu , Lechao Xiao , Ben Adlam , Jeffrey Pennington

Analyzing Finite Neural Networks: Can We Trust Neural Tangent Kernel Theory?

Neural Tangent Kernel (NTK) theory is widely used to study the dynamics of infinitely-wide deep neural networks (DNNs) under gradient descent. But do the results for infinitely-wide networks give us hints about the behavior of real…

Machine Learning · Computer Science 2022-02-02 Mariia Seleznova , Gitta Kutyniok

Towards Practical Quantum Neural Network Diagnostics with Neural Tangent Kernels

Knowing whether a Quantum Machine Learning model would perform well on a given dataset before training it can help to save critical resources. However, gathering a priori information about model performance (e.g., training speed, critical…

Quantum Physics · Physics 2025-03-05 Francesco Scala , Christa Zoufal , Dario Gerace , Francesco Tacchino

The Neural Tangent Kernel for Classification

In wide neural networks, the Neural Tangent Kernel (NTK) remains approximately constant during training, providing a powerful theoretical tool for studying training dynamics, generalization, and connections to kernel methods. However, this…

Machine Learning · Computer Science 2026-05-26 Jonathan Plenk , Sergio Calvo-Ordonez , Alvaro Cartea , Yarin Gal , Mark van der Wilk , Kamil Ciosek

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized…

Machine Learning · Computer Science 2024-05-14 Yubin Shi , Yixuan Chen , Mingzhi Dong , Xiaochen Yang , Dongsheng Li , Yujiang Wang , Robert P. Dick , Qin Lv , Yingying Zhao , Fan Yang , Tun Lu , Ning Gu , Li Shang

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

We analyze speed of convergence to global optimum for gradient descent training a deep linear neural network (parameterized as $x \mapsto W_N W_{N-1} \cdots W_1 x$) by minimizing the $\ell_2$ loss over whitened data. Convergence at a linear…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Nadav Cohen , Noah Golowich , Wei Hu

Efficient Training Convolutional Neural Networks on Edge Devices with Gradient-pruned Sign-symmetric Feedback Alignment

With the prosperity of mobile devices, the distributed learning approach enabling model training with decentralized data has attracted wide research. However, the lack of training capability for edge devices significantly limits the energy…

Machine Learning · Computer Science 2021-05-14 Ziyang Hong , C. Patrick Yue

NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks

Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect…

Machine Learning · Computer Science 2020-02-20 Mihailo Isakov , Michel A. Kinsy