Related papers: Speed Limits for Deep Learning

Neural Spectrum Alignment: Empirical Study

Expressiveness and generalization of deep models was recently addressed via the connection between neural networks (NNs) and kernel learning, where first-order dynamics of NN during a gradient-descent (GD) optimization were related to…

Machine Learning · Computer Science 2020-04-21 Dmitry Kopitkov , Vadim Indelman

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

Recent theoretical works based on the neural tangent kernel (NTK) have shed light on the optimization and generalization of over-parameterized networks, and partially bridge the gap between their practical success and classical learning…

Machine Learning · Computer Science 2020-08-10 Kyung-Su Kim , Aurélie C. Lozano , Eunho Yang

On Exact Computation with an Infinitely Wide Neural Net

How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset such as CIFAR-10 when its width --- namely, number of channels in convolutional layers, and number of nodes in fully-connected internal…

Machine Learning · Computer Science 2019-11-05 Sanjeev Arora , Simon S. Du , Wei Hu , Zhiyuan Li , Ruslan Salakhutdinov , Ruosong Wang

Training Spiking Neural Networks with Local Tandem Learning

Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient over their predecessors. However, there is a lack of an efficient and generalized training method for deep SNNs, especially for deployment on…

Neural and Evolutionary Computing · Computer Science 2022-10-11 Qu Yang , Jibin Wu , Malu Zhang , Yansong Chua , Xinchao Wang , Haizhou Li

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel

The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over-parameterized Neural Network (NN) trained by gradient descent. However, it is now well-known that gradient descent is not always…

Machine Learning · Computer Science 2021-03-23 Lei Tan , Shutong Wu , Xiaolin Huang

Graph Neural Tangent Kernel: Convergence on Large Graphs

Graph neural networks (GNNs) achieve remarkable performance in graph machine learning tasks but can be hard to train on large-graph data, where their learning dynamics are not well understood. We investigate the training dynamics of…

Machine Learning · Computer Science 2023-06-02 Sanjukta Krishnagopal , Luana Ruiz

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to…

Machine Learning · Statistics 2024-03-04 Lingyu Gu , Yongqi Du , Yuan Zhang , Di Xie , Shiliang Pu , Robert C. Qiu , Zhenyu Liao

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the…

Computer Vision and Pattern Recognition · Computer Science 2016-12-06 Janis Keuper , Franz-Josef Pfreundt

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Two key challenges facing modern deep learning are mitigating deep networks' vulnerability to adversarial attacks and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been…

Machine Learning · Computer Science 2022-10-24 Noel Loo , Ramin Hasani , Alexander Amini , Daniela Rus

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions…

Machine Learning · Computer Science 2026-05-26 Sihan Zeng , Sujay Bhatt , Sumitra Ganesh

Reactivation: Empirical NTK Dynamics Under Task Shifts

The Neural Tangent Kernel (NTK) offers a powerful tool to study the functional dynamics of neural networks. In the so-called lazy, or kernel regime, the NTK remains static during training and the network function is linear in the static…

Machine Learning · Computer Science 2025-07-28 Yuzhi Liu , Zixuan Chen , Zirui Zhang , Yufei Liu , Giulia Lanzillotta

A mean-field limit for certain deep neural networks

Understanding deep neural networks (DNNs) is a key challenge in the theory of machine learning, with potential applications to the many fields where DNNs have been successfully used. This article presents a scaling limit for a DNN being…

Statistics Theory · Mathematics 2019-06-04 Dyego Araújo , Roberto I. Oliveira , Daniel Yukimura

Connecting NTK and NNGP: A Unified Theoretical Framework for Wide Neural Network Learning Dynamics

Artificial neural networks have revolutionized machine learning in recent years, but a complete theoretical framework for their learning process is still lacking. Substantial advances were achieved for wide networks, within two disparate…

Machine Learning · Computer Science 2025-05-09 Yehonatan Avidan , Qianyi Li , Haim Sompolinsky

Disentangling Trainability and Generalization in Deep Neural Networks

A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a…

Machine Learning · Computer Science 2020-07-14 Lechao Xiao , Jeffrey Pennington , Samuel S. Schoenholz

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error. Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization…

Machine Learning · Computer Science 2020-10-28 Taiji Suzuki

How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective

Recent works have examined theoretical and empirical properties of wide neural networks trained in the Neural Tangent Kernel (NTK) regime. Given that biological neural networks are much wider than their artificial counterparts, we consider…

Machine Learning · Computer Science 2022-07-14 Akhilan Boopathy , Ila Fiete

Distributed Training of Deep Neural Networks with Theoretical Analysis: Under SSP Setting

We propose a distributed approach to train deep neural networks (DNNs), which has guaranteed convergence theoretically and great scalability empirically: close to 6 times faster on instance of ImageNet data set when run with 6 machines. The…

Machine Learning · Statistics 2016-10-04 Abhimanu Kumar , Pengtao Xie , Junming Yin , Eric P. Xing

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks

Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called…

Machine Learning · Computer Science 2019-10-29 Sanjeev Arora , Simon S. Du , Zhiyuan Li , Ruslan Salakhutdinov , Ruosong Wang , Dingli Yu

Deep Neural Network Training as Random Effects: An Optimization-Inference Duality

Deep neural networks (DNNs) have achieved remarkable empirical success, yet their training dynamics remain understood mainly from optimization rather than statistical principles. Here we develop a statistical framework for DNN training in…

Machine Learning · Statistics 2026-05-28 Minhao Yao , Ruoyu Wang , Xihong Lin , Lin Liu , Zhonghua Liu

Deep Learning in Target Space

Deep learning uses neural networks which are parameterised by their weights. The neural networks are usually trained by tuning the weights to directly minimise a given loss function. In this paper we propose to re-parameterise the weights…

Neural and Evolutionary Computing · Computer Science 2022-03-14 Michael Fairbank , Spyridon Samothrakis , Luca Citi