English
Related papers

Related papers: A Practical Layer-Parallel Training Algorithm for …

200 papers

Gradient-based methods for the distributed training of residual networks (ResNets) typically require a forward pass of the input data, followed by back-propagating the error gradient to update model parameters, which becomes time-consuming…

Machine Learning · Computer Science 2021-12-13 Qi Sun , Hexin Dong , Zewei Chen , Jiacheng Sun , Zhenguo Li , Bin Dong

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be…

Optimization and Control · Mathematics 2019-07-26 S. Günther , L. Ruthotto , J. B. Schroder , E. C. Cyr , N. R. Gauger

We propose both serial and parallel proximal (linearized) alternating direction method of multipliers (ADMM) algorithms for training residual neural networks. In contrast to backpropagation-based approaches, our methods inherently mitigate…

Machine Learning · Computer Science 2025-04-01 Jintao Xu , Yifei Li , Wenxun Xing

We present a new training methodology for transformers using a multilevel, layer-parallel approach. Through a neural ODE formulation of transformers, our application of a multilevel parallel-in-time algorithm for the forward and…

Machine Learning · Computer Science 2026-01-27 Shuai Jiang , Marc Salvadó-Benasco , Eric C. Cyr , Alena Kopaničáková , Rolf Krause , Jacob B. Schroder

Training deep neural networks on large-scale datasets requires significant hardware resources whose costs (even on cloud platforms) put them out of reach of smaller organizations, groups, and individuals. Backpropagation, the workhorse for…

Machine Learning · Computer Science 2020-09-22 Alexander Ororbia , Ankur Mali , Daniel Kifer , C. Lee Giles

End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward…

Machine Learning · Computer Science 2023-06-07 Skander Karkar , Ibrahim Ayed , Emmanuel de Bézenac , Patrick Gallinari

We present a new multilevel minimization framework for the training of deep residual networks (ResNets), which has the potential to significantly reduce training time and effort. Our framework is based on the dynamical system's viewpoint,…

Machine Learning · Computer Science 2020-04-15 Lisa Gaedke-Merzhäuser , Alena Kopaničáková , Rolf Krause

Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking's problems, leading to poor performance in terms of training time. The existing parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-25 Samson B. Akintoye , Liangxiu Han , Huw Lloyd , Xin Zhang , Darren Dancey , Haoming Chen , Daoqiang Zhang

Despite being the cornerstone of deep learning, backpropagation is criticized for its inherent sequentiality, which can limit the scalability of very deep models. Such models faced convergence issues due to vanishing gradient, later…

Machine Learning · Computer Science 2025-04-01 Erwan Fagnou , Paul Caillon , Blaise Delattre , Alexandre Allauzen

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden…

Machine Learning · Computer Science 2025-06-27 Jens Püttschneider , Simon Heilig , Asja Fischer , Timm Faulwasser

Physics and equality constrained artificial neural networks (PECANN) are grounded in methods of constrained optimization to properly constrain the solution of partial differential equations (PDEs) with their boundary and initial conditions…

Machine Learning · Computer Science 2023-07-18 Shamsulhaq Basir , Inanc Senocak

We propose a globally convergent multilevel training method for deep residual networks (ResNets). The devised method can be seen as a novel variant of the recursive multilevel trust-region (RMTR) method, which operates in hybrid…

Machine Learning · Computer Science 2022-06-14 Alena Kopaničáková , Rolf Krause

We propose a new technique that boosts the convergence of training generative adversarial networks. Generally, the rate of training deep models reduces severely after multiple iterations. A key reason for this phenomenon is that a deep…

Machine Learning · Statistics 2018-06-15 Atsushi Nitanda , Taiji Suzuki

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple…

Machine Learning · Computer Science 2023-06-23 Xin Yuan , Pedro Savarese , Michael Maire

Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully…

Machine Learning · Statistics 2018-02-05 Bo Chang , Lili Meng , Eldad Haber , Frederick Tung , David Begert

Backpropagation is driving today's artificial neural networks (ANNs). However, despite extensive research, it remains unclear if the brain implements this algorithm. Among neuroscientists, reinforcement learning (RL) algorithms are often…

Neurons and Cognition · Quantitative Biology 2020-04-24 Benjamin James Lansdell , Prashanth Ravi Prakash , Konrad Paul Kording

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

This paper presents a comparative analysis of distributed training strategies for large-scale neural networks, focusing on data parallelism, model parallelism, and hybrid approaches. We evaluate these strategies on image classification…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-01 Vishnu Vardhan Baligodugula , Fathi Amsaad

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x…

Machine Learning · Computer Science 2020-09-01 Andrew C. Kirby , Siddharth Samsi , Michael Jones , Albert Reuther , Jeremy Kepner , Vijay Gadepally
‹ Prev 1 2 3 10 Next ›