Related papers: Interlocking Backpropagation: Improving depthwise …

Parallel Training of Deep Networks with Local Updates

Deep learning models trained on large data sets have been widely successful in both vision and language domains. As state-of-the-art deep learning architectures have continued to grow in parameter count so have the compute budgets and times…

Machine Learning · Computer Science 2021-06-16 Michael Laskin , Luke Metz , Seth Nabarro , Mark Saroufim , Badreddine Noune , Carlo Luschi , Jascha Sohl-Dickstein , Pieter Abbeel

Block-local learning with probabilistic latent representations

The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a…

Machine Learning · Computer Science 2023-10-31 David Kappel , Khaleelulla Khan Nazeer , Cabrel Teguemne Fokam , Christian Mayr , Anand Subramoney

Seeking Next Layer Neurons' Attention for Error-Backpropagation-Like Training in a Multi-Agent Network Framework

Despite considerable theoretical progress in the training of neural networks viewed as a multi-agent system of neurons, particularly concerning biological plausibility and decentralized training, their applicability to real-world problems…

Neural and Evolutionary Computing · Computer Science 2023-10-17 Arshia Soltani Moakhar , Mohammad Azizmalayeri , Hossein Mirzaei , Mohammad Taghi Manzuri , Mohammad Hossein Rohban

Local Learning with Neuron Groups

Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to…

Machine Learning · Computer Science 2023-01-19 Adeetya Patel , Michael Eickenberg , Eugene Belilovsky

BackLink: Supervised Local Training with Backward Links

Empowered by the backpropagation (BP) algorithm, deep neural networks have dominated the race in solving various cognitive tasks. The restricted training pattern in the standard BP requires end-to-end error propagation, causing large memory…

Machine Learning · Computer Science 2022-05-17 Wenzhe Guo , Mohammed E Fouda , Ahmed M. Eltawil , Khaled N. Salama

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Distributed Training and Optimization Of Neural Networks

Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large…

Machine Learning · Computer Science 2022-12-20 Jean-Roch Vlimant , Junqi Yin

Local vs Global continual learning

Continual learning is the problem of integrating new information in a model while retaining the knowledge acquired in the past. Despite the tangible improvements achieved in recent years, the problem of continual learning is still an open…

Machine Learning · Computer Science 2024-07-24 Giulia Lanzillotta , Sidak Pal Singh , Benjamin F. Grewe , Thomas Hofmann

Predict Globally, Correct Locally: Parallel-in-Time Optimal Control of Neural Networks

The links between optimal control of dynamical systems and neural networks have proved beneficial both from a theoretical and from a practical point of view. Several researchers have exploited these links to investigate the stability of…

Optimization and Control · Mathematics 2019-02-08 Panos Parpas , Corey Muir

Locally Supervised Learning with Periodic Global Guidance

Locally supervised learning aims to train a neural network based on a local estimation of the global loss function at each decoupled module of the network. Auxiliary networks are typically appended to the modules to approximate the gradient…

Machine Learning · Computer Science 2022-08-02 Hasnain Irshad Bhatti , Jaekyun Moon

Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide

With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference. While existing surveys provide descriptive…

Machine Learning · Computer Science 2026-02-11 Hossam Amer , Rezaul Karim , Ali Pourranjbar , Weiwei Zhang , Walid Ahmed , Boxing Chen

Distributed Optimization for Over-Parameterized Learning

Distributed optimization often consists of two updating phases: local optimization and inter-node communication. Conventional approaches require working nodes to communicate with the server every one or few iterations to guarantee…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-17 Chi Zhang , Qianxiao Li

A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation

In a physical neural system, where storage and processing are intimately intertwined, the rules for adjusting the synaptic weights can only depend on variables that are available locally, such as the activity of the pre- and post-synaptic…

Machine Learning · Computer Science 2016-10-25 Pierre Baldi , Peter Sadowski

Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View

Deep learning has become an indispensable part of life, such as face recognition, NLP, etc., but the training of deep model has always been a challenge, and in recent years, the complexity of training data and models has shown explosive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-18 Sheng Huang

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Scaling models has led to significant advancements in deep learning, but training these models in decentralized settings remains challenging due to communication bottlenecks. While existing compression techniques are effective in…

Machine Learning · Computer Science 2025-06-03 Sameera Ramasinghe , Thalaiyasingam Ajanthan , Gil Avraham , Yan Zuo , Alexander Long

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Relieving the reliance of neural network training on a global back-propagation (BP) has emerged as a notable research topic due to the biological implausibility and huge memory consumption caused by BP. Among the existing solutions, local…

Machine Learning · Computer Science 2024-06-11 Yibo Yang , Xiaojie Li , Motasem Alfarra , Hasan Hammoud , Adel Bibi , Philip Torr , Bernard Ghanem

Decoupled Parallel Backpropagation with Convergence Guarantee

Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in…

Machine Learning · Computer Science 2018-07-24 Zhouyuan Huo , Bin Gu , Qian Yang , Heng Huang

Fast Parametric Learning with Activation Memorization

Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance…

Machine Learning · Computer Science 2018-03-28 Jack W Rae , Chris Dyer , Peter Dayan , Timothy P Lillicrap

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be…

Optimization and Control · Mathematics 2019-07-26 S. Günther , L. Ruthotto , J. B. Schroder , E. C. Cyr , N. R. Gauger

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

Machine learning frameworks adopt iterative optimizers to train neural networks. Conventional eager execution separates the updating of trainable parameters from forward and backward computations. However, this approach introduces…

Machine Learning · Computer Science 2021-04-02 Zixuan Jiang , Jiaqi Gu , Mingjie Liu , Keren Zhu , David Z. Pan