Related papers: Accumulated Decoupled Learning: Mitigating Gradien…

Distributed Deep Learning using Stochastic Gradient Staleness

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

Decoupled Greedy Learning of CNNs for Synchronous and Asynchronous Distributed Learning

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can…

Machine Learning · Computer Science 2021-06-14 Eugene Belilovsky , Louis Leconte , Lucas Caccia , Michael Eickenberg , Edouard Oyallon

Decoupled Greedy Learning of CNNs

A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate…

Machine Learning · Computer Science 2020-06-23 Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

AdaL: Adaptive Gradient Transformation Contributes to Convergences and Generalizations

Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from…

Machine Learning · Computer Science 2021-07-06 Hongwei Zhang , Weidong Zou , Hongbo Zhao , Qi Ming , Tijin Yan , Yuanqing Xia , Weipeng Cao

Staleness-aware Async-SGD for Distributed Deep Learning

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation

In asynchronous federated learning (FL), client devices send updates to a central server at varying times based on their computational speed, often using stale versions of the global model. This staleness can degrade the convergence and…

Machine Learning · Computer Science 2026-03-10 Patrick Wilhelm , Odej Kao

AEDFL: Efficient Asynchronous Decentralized Federated Learning with Heterogeneous Devices

Federated Learning (FL) has achieved significant achievements recently, enabling collaborative model training on distributed data over edge devices. Iterative gradient or model exchanges between devices and the centralized server in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-19 Ji Liu , Tianshi Che , Yang Zhou , Ruoming Jin , Huaiyu Dai , Dejing Dou , Patrick Valduriez

DPL: Decoupled Prompt Learning for Vision-Language Models

Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Chen Xu , Yuhan Zhu , Guozhen Zhang , Haocheng Shen , Yixuan Liao , Xiaoxin Chen , Gangshan Wu , Limin Wang

Accelerated Gradient-free Neural Network Training by Multi-convex Alternating Optimization

In recent years, even though Stochastic Gradient Descent (SGD) and its variants are well-known for training neural networks, it suffers from limitations such as the lack of theoretical guarantees, vanishing gradients, and excessive…

Optimization and Control · Mathematics 2022-02-17 Junxiang Wang , Hongyi Li , Liang Zhao

AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost

Recent advances in deep learning are driven by the growing scale of computation, data, and models. However, efficiently training large-scale models on distributed systems requires an intricate combination of data, operator, and pipeline…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-22 Jinfan Chen , Shigang Li , Ran Gun , Jinhui Yuan , Torsten Hoefler

Faster Asynchronous SGD

Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since…

Machine Learning · Statistics 2016-01-18 Augustus Odena

Depersonalized Federated Learning: Tackling Statistical Heterogeneity by Alternating Stochastic Gradient Descent

Federated learning (FL), which has gained increasing attention recently, enables distributed devices to train a common machine learning (ML) model for intelligent inference cooperatively without data sharing. However, problems in practical…

Machine Learning · Computer Science 2022-11-01 Yujie Zhou , Zhidu Li , Tong Tang , Ruyan Wang

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

On the Acceleration of Deep Learning Model Parallelism with Staleness

Training the deep convolutional neural network for computer vision problems is slow and inefficient, especially when it is large and distributed across multiple devices. The inefficiency is caused by the backpropagation algorithm's forward…

Machine Learning · Computer Science 2022-01-20 An Xu , Zhouyuan Huo , Heng Huang

Distal Interference: Exploring the Limits of Model-Based Continual Learning

Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new…

Machine Learning · Computer Science 2024-02-14 Heinrich van Deventer , Anna Sergeevna Bosman

SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism

Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a…

Machine Learning · Computer Science 2026-02-04 Ming-Yao Ho , Cheng-Kai Wang , You-Teng Lin , Hung-Hsuan Chen

DELTA: Decoupling Long-Tailed Online Continual Learning

A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting…

Machine Learning · Computer Science 2024-04-09 Siddeshwar Raghavan , Jiangpeng He , Fengqing Zhu

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training. Synchronous Stochastic Gradient Descent (SSGD) 1 is the de facto DDL optimization method. Using a sufficiently large batch size is critical to…

Machine Learning · Computer Science 2021-12-03 Wei Zhang , Mingrui Liu , Yu Feng , Xiaodong Cui , Brian Kingsbury , Yuhai Tu

Gradient Coupling: The Hidden Barrier to Generalization in Agentic Reinforcement Learning

Reinforcement learning (RL) is a dominant paradigm for training autonomous agents, yet these agents often exhibit poor generalization, failing to adapt to scenarios not seen during training. In this work, we identify a fundamental cause of…

Artificial Intelligence · Computer Science 2026-01-16 Jingyu Liu , Xiaopeng Wu , Jingquan Peng , Kehan Chen , Chuan Yu , Lizhong Ding , Yong Liu

Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a…

Machine Learning · Computer Science 2022-02-08 Hao Chen , Yu Ye , Ming Xiao , Mikael Skoglund