Related papers: Speed Limits for Deep Learning

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are…

Machine Learning · Computer Science 2022-11-02 Xinmeng Huang , Kun Yuan

Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs

Physics-informed Kolmogorov-Arnold Networks (PIKANs), and in particular their Chebyshev-based variants (cPIKANs), have recently emerged as promising models for solving partial differential equations (PDEs). However, their training dynamics…

Machine Learning · Computer Science 2025-06-10 Salah A. Faroughi , Farinaz Mostajeran

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to…

Machine Learning · Computer Science 2026-05-21 Konstantin Riedl , Konstantinos Spiliopoulos , Justin Sirignano

A Synapse-Threshold Synergistic Learning Approach for Spiking Neural Networks

Spiking neural networks (SNNs) have demonstrated excellent capabilities in various intelligent scenarios. Most existing methods for training SNNs are based on the concept of synaptic plasticity; however, learning in the realistic brain also…

Neural and Evolutionary Computing · Computer Science 2023-04-04 Hongze Sun , Wuque Cai , Baoxin Yang , Yan Cui , Yang Xia , Dezhong Yao , Daqing Guo

Efficient Learning for Deep Quantum Neural Networks

Neural networks enjoy widespread success in both research and industry and, with the imminent advent of quantum technology, it is now a crucial challenge to design quantum neural networks for fully quantum learning tasks. Here we propose…

Quantum Physics · Physics 2020-04-30 Kerstin Beer , Dmytro Bondarenko , Terry Farrelly , Tobias J. Osborne , Robert Salzmann , Ramona Wolf

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple…

Machine Learning · Computer Science 2023-06-23 Xin Yuan , Pedro Savarese , Michael Maire

Fast Training of Convolutional Neural Networks via Kernel Rescaling

Training deep Convolutional Neural Networks (CNN) is a time consuming task that may take weeks to complete. In this article we propose a novel, theoretically founded method for reducing CNN training time without incurring any loss in…

Computer Vision and Pattern Recognition · Computer Science 2016-10-13 Pedro Porto Buarque de Gusmão , Gianluca Francini , Skjalg Lepsøy , Enrico Magli

Training Infinitely Deep and Wide Transformers

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based…

Optimization and Control · Mathematics 2026-05-19 Raphaël Barboni , Maarten V. de Hoop , Takashi Furuya , Gabriel Peyré

Channel Estimation by Infinite Width Convolutional Networks

In wireless communications, estimation of channels in OFDM systems spans frequency and time, which relies on sparse collections of pilot data, posing an ill-posed inverse problem. Moreover, deep learning estimators require large amounts of…

Machine Learning · Computer Science 2025-04-14 Mohammed Mallik , Guillaume Villemaud

Fast Adaptation with Linearized Neural Networks

The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of…

Machine Learning · Statistics 2021-04-29 Wesley J. Maddox , Shuai Tang , Pablo Garcia Moreno , Andrew Gordon Wilson , Andreas Damianou

Improving Stability and Performance of Spiking Neural Networks through Enhancing Temporal Consistency

Spiking neural networks have gained significant attention due to their brain-like information processing capabilities. The use of surrogate gradients has made it possible to train spiking neural networks with backpropagation, leading to…

Neural and Evolutionary Computing · Computer Science 2023-05-24 Dongcheng Zhao , Guobin Shen , Yiting Dong , Yang Li , Yi Zeng

Going Deeper With Directly-Trained Larger Spiking Neural Networks

Spiking neural networks (SNNs) are promising in a bio-plausible coding for spatio-temporal information and event-driven signal processing, which is very suited for energy-efficient implementation in neuromorphic hardware. However, the…

Neural and Evolutionary Computing · Computer Science 2020-12-21 Hanle Zheng , Yujie Wu , Lei Deng , Yifan Hu , Guoqi Li

Deep Networks with Stochastic Depth

Very deep convolutional networks with hundreds of layers have led to significant reductions in error on competitive benchmarks. Although the unmatched expressiveness of the many layers can be highly desirable at test time, training very…

Machine Learning · Computer Science 2016-08-01 Gao Huang , Yu Sun , Zhuang Liu , Daniel Sedra , Kilian Weinberger

Training Neural Networks by Optimizing Neuron Positions

The high computational complexity and increasing parameter counts of deep neural networks pose significant challenges for deployment in resource-constrained environments, such as edge devices or real-time systems. To address this, we…

Machine Learning · Computer Science 2025-06-17 Laura Erb , Tommaso Boccato , Alexandru Vasilache , Juergen Becker , Nicola Toschi

Layer-Specific Adaptive Learning Rates for Deep Networks

The increasing complexity of deep learning architectures is resulting in training time requiring weeks or even months. This slow training is due in part to vanishing gradients, in which the gradients used by back-propagation are extremely…

Computer Vision and Pattern Recognition · Computer Science 2015-10-16 Bharat Singh , Soham De , Yangmuzi Zhang , Thomas Goldstein , Gavin Taylor

Deep Spiking Neural Network with Spike Count based Learning Rule

Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart. However, insufficient attention…

Neural and Evolutionary Computing · Computer Science 2019-02-18 Jibin Wu , Yansong Chua , Malu Zhang , Qu Yang , Guoqi Li , Haizhou Li

Efficient Neural Network Training via Subset Pretraining

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true…

Machine Learning · Computer Science 2024-11-25 Jan Spörer , Bernhard Bermeitinger , Tomas Hrycej , Niklas Limacher , Siegfried Handschuh

Theory-training deep neural networks for an alloy solidification benchmark problem

Deep neural networks are machine learning tools that are transforming fields ranging from speech recognition to computational medicine. In this study, we extend their application to the field of alloy solidification modeling. To that end,…

Applied Physics · Physics 2019-12-23 M. Torabi Rad , A. Viardin , G. J. Schmitz , M. Apel

Training a General Spiking Neural Network with Improved Efficiency and Minimum Latency

Spiking Neural Networks (SNNs) that operate in an event-driven manner and employ binary spike representation have recently emerged as promising candidates for energy-efficient computing. However, a cost bottleneck arises in obtaining…

Neural and Evolutionary Computing · Computer Science 2024-01-22 Yunpeng Yao , Man Wu , Zheng Chen , Renyuan Zhang

On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective

The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic…

Machine Learning · Computer Science 2024-05-30 Jonathan Wenger , Felix Dangel , Agustinus Kristiadi