Related papers: A Practical Layer-Parallel Training Algorithm for …

Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks

Gradient-based methods for the distributed training of residual networks (ResNets) typically require a forward pass of the input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming…

Machine Learning · Computer Science 2021-12-13 Qi Sun , Hexin Dong , Zewei Chen , Jiacheng Sun , Zhenguo Li , Bin Dong

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be…

Optimization and Control · Mathematics 2019-07-26 S. Günther , L. Ruthotto , J. B. Schroder , E. C. Cyr , N. R. Gauger

ADMM Algorithms for Residual Network Training: Convergence Analysis and Parallel Implementation

We propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks. In contrast to backpropagation-based approaches, our methods inherently mitigate…

Machine Learning · Computer Science 2025-04-01 Jintao Xu , Yifei Li , Wenxun Xing

Layer-Parallel Training for Transformers

We present a new training methodology for transformers using a multilevel, layer-parallel approach. Through a neural ODE formulation of transformers, our application of a multilevel parallel-in-time algorithm for the forward and…

Machine Learning · Computer Science 2026-01-27 Shuai Jiang , Marc Salvadó-Benasco , Eric C. Cyr , Alena Kopaničáková , Rolf Krause , Jacob B. Schroder

Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation, the workhorse for…

Machine Learning · Computer Science 2020-09-22 Alexander Ororbia , Ankur Mali , Daniel Kifer , C. Lee Giles

Block-wise Training of Residual Networks via the Minimizing Movement Scheme

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward…

Machine Learning · Computer Science 2023-06-07 Skander Karkar , Ibrahim Ayed , Emmanuel de Bézenac , Patrick Gallinari

Multilevel Minimization for Deep Residual Networks

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint,…

Machine Learning · Computer Science 2020-04-15 Lisa Gaedke-Merzhäuser , Alena Kopaničáková , Rolf Krause

Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning

Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking's problems, leading to poor performance in terms of training time. The existing parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-25 Samson B. Akintoye , Liangxiu Han , Huw Lloyd , Xin Zhang , Darren Dancey , Haoming Chen , Daoqiang Zhang

Accelerated Training through Iterative Gradient Propagation Along the Residual Path

Despite being the cornerstone of deep learning, backpropagation is criticized for its inherent sequentiality, which can limit the scalability of very deep models. Such models faced convergence issues due to vanishing gradient, later…

Machine Learning · Computer Science 2025-04-01 Erwan Fagnou , Paul Caillon , Blaise Delattre , Alexandre Allauzen

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Towards an Optimal Control Perspective of ResNet Training

We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden…

Machine Learning · Computer Science 2025-06-27 Jens Püttschneider , Simon Heilig , Asja Fischer , Timm Faulwasser

An adaptive augmented Lagrangian method for training physics and equality constrained artificial neural networks

Physics and equality constrained artificial neural networks (PECANN) are grounded in methods of constrained optimization to properly constrain the solution of partial differential equations (PDEs) with their boundary and initial conditions…

Machine Learning · Computer Science 2023-07-18 Shamsulhaq Basir , Inanc Senocak

Globally Convergent Multilevel Training of Deep Residual Networks

We propose a globally convergent multilevel training method for deep residual networks (ResNets). The devised method can be seen as a novel variant of the recursive multilevel trust-region (RMTR) method, which operates in hybrid…

Machine Learning · Computer Science 2022-06-14 Alena Kopaničáková , Rolf Krause

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

We propose a new technique that boosts the convergence of training generative adversarial networks. Generally, the rate of training deep models reduces severely after multiple iterations. A key reason for this phenomenon is that a deep…

Machine Learning · Statistics 2018-06-15 Atsushi Nitanda , Taiji Suzuki

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple…

Machine Learning · Computer Science 2023-06-23 Xin Yuan , Pedro Savarese , Michael Maire

Multi-level Residual Networks from Dynamical Systems View

Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully…

Machine Learning · Statistics 2018-02-05 Bo Chang , Lili Meng , Eldad Haber , Frederick Tung , David Begert

Learning to solve the credit assignment problem

Backpropagation is driving today's artificial neural networks (ANNs). However, despite extensive research, it remains unclear if the brain implements this algorithm. Among neuroscientists, reinforcement learning (RL) algorithms are often…

Neurons and Cognition · Quantitative Biology 2020-04-24 Benjamin James Lansdell , Prashanth Ravi Prakash , Konrad Paul Kording

Parallel Training of Deep Networks with Local Updates

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

Machine Learning · Computer Science 2021-06-16 Michael Laskin , Luke Metz , Seth Nabarro , Mark Saroufim , Badreddine Noune , Carlo Luschi , Jascha Sohl-Dickstein , Pieter Abbeel

Optimizing Distributed Training Approaches for Scaling Neural Networks

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x…

Machine Learning · Computer Science 2020-09-01 Andrew C. Kirby , Siddharth Samsi , Michael Jones , Albert Reuther , Jeremy Kepner , Vijay Gadepally