English
Related papers

Related papers: Accumulated Decoupled Learning: Mitigating Gradien…

200 papers

Despite the notable success of deep neural networks (DNNs) in solving complex tasks, the training process still remains considerable challenges. A primary obstacle is the substantial time required for training, particularly as high…

Machine Learning · Computer Science 2025-09-09 Viet Hoang Pham , Hyo-Sung Ahn

A commonly cited inefficiency of neural network training using back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can…

Machine Learning · Computer Science 2021-06-14 Eugene Belilovsky , Louis Leconte , Lucas Caccia , Michael Eickenberg , Edouard Oyallon

A commonly cited inefficiency of neural network training by back-propagation is the update locking problem: each layer must wait for the signal to propagate through the full network before updating. Several alternatives that can alleviate…

Machine Learning · Computer Science 2020-06-23 Eugene Belilovsky , Michael Eickenberg , Edouard Oyallon

Adaptive optimization methods have been widely used in deep learning. They scale the learning rates adaptively according to the past gradient, which has been shown to be effective to accelerate the convergence. However, they suffer from…

Machine Learning · Computer Science 2021-07-06 Hongwei Zhang , Weidong Zou , Hongbo Zhao , Qi Ming , Tijin Yan , Yuanqing Xia , Weipeng Cao

Deep neural networks have been shown to achieve state-of-the-art performance in several machine learning tasks. Stochastic Gradient Descent (SGD) is the preferred optimization algorithm for training these networks and asynchronous SGD…

Machine Learning · Computer Science 2016-04-06 Wei Zhang , Suyog Gupta , Xiangru Lian , Ji Liu

In asynchronous federated learning (FL), client devices send updates to a central server at varying times based on their computational speed, often using stale versions of the global model. This staleness can degrade the convergence and…

Machine Learning · Computer Science 2026-03-10 Patrick Wilhelm , Odej Kao

Federated Learning (FL) has achieved significant achievements recently, enabling collaborative model training on distributed data over edge devices. Iterative gradient or model exchanges between devices and the centralized server in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-19 Ji Liu , Tianshi Che , Yang Zhou , Ruoming Jin , Huaiyu Dai , Dejing Dou , Patrick Valduriez

Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Chen Xu , Yuhan Zhu , Guozhen Zhang , Haocheng Shen , Yixuan Liao , Xiaoxin Chen , Gangshan Wu , Limin Wang

In recent years, even though Stochastic Gradient Descent (SGD) and its variants are well-known for training neural networks, it suffers from limitations such as the lack of theoretical guarantees, vanishing gradients, and excessive…

Optimization and Control · Mathematics 2022-02-17 Junxiang Wang , Hongyi Li , Liang Zhao

Recent advances in deep learning are driven by the growing scale of computation, data, and models. However, efficiently training large-scale models on distributed systems requires an intricate combination of data, operator, and pipeline…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-22 Jinfan Chen , Shigang Li , Ran Gun , Jinhui Yuan , Torsten Hoefler

Asynchronous distributed stochastic gradient descent methods have trouble converging because of stale gradients. A gradient update sent to a parameter server by a client is stale if the parameters used to calculate that gradient have since…

Machine Learning · Statistics 2016-01-18 Augustus Odena

Federated learning (FL), which has gained increasing attention recently, enables distributed devices to train a common machine learning (ML) model for intelligent inference cooperatively without data sharing. However, problems in practical…

Machine Learning · Computer Science 2022-11-01 Yujie Zhou , Zhidu Li , Tong Tang , Ruyan Wang

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Training the deep convolutional neural network for computer vision problems is slow and inefficient, especially when it is large and distributed across multiple devices. The inefficiency is caused by the backpropagation algorithm's forward…

Machine Learning · Computer Science 2022-01-20 An Xu , Zhouyuan Huo , Heng Huang

Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new…

Machine Learning · Computer Science 2024-02-14 Heinrich van Deventer , Anna Sergeevna Bosman

Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard end-to-end backpropagation (BP) algorithm is a…

Machine Learning · Computer Science 2026-02-04 Ming-Yao Ho , Cheng-Kai Wang , You-Teng Lin , Hung-Hsuan Chen

A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting…

Machine Learning · Computer Science 2024-04-09 Siddeshwar Raghavan , Jiangpeng He , Fengqing Zhu

Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training. Synchronous Stochastic Gradient Descent (SSGD) 1 is the de facto DDL optimization method. Using a sufficiently large batch size is critical to…

Machine Learning · Computer Science 2021-12-03 Wei Zhang , Mingrui Liu , Yu Feng , Xiaodong Cui , Brian Kingsbury , Yuhai Tu

Reinforcement learning (RL) is a dominant paradigm for training autonomous agents, yet these agents often exhibit poor generalization, failing to adapt to scenarios not seen during training. In this work, we identify a fundamental cause of…

Artificial Intelligence · Computer Science 2026-01-16 Jingyu Liu , Xiaopeng Wu , Jingquan Peng , Kehan Chen , Chuan Yu , Lizhong Ding , Yong Liu

Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a…

Machine Learning · Computer Science 2022-02-08 Hao Chen , Yu Ye , Ming Xiao , Mikael Skoglund
‹ Prev 1 2 3 10 Next ›