English
Related papers

Related papers: Quantized Epoch-SGD for Communication-Efficient Di…

200 papers

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

We propose LQ-SGD (Low-Rank Quantized Stochastic Gradient Descent), an efficient communication gradient compression algorithm designed for distributed training. LQ-SGD further develops on the basis of PowerSGD by incorporating the low-rank…

Machine Learning · Computer Science 2025-06-24 Hongyang Li , Lincen Bai , Caesar Wu , Mohammed Chadli , Said Mammar , Pascal Bouvry

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…

Machine Learning · Computer Science 2021-08-02 Guangfeng Yan , Shao-Lun Huang , Tian Lan , Linqi Song

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-05 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-24 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the…

Machine Learning · Computer Science 2021-01-20 Peng Jiang , Gagan Agrawal

We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…

Optimization and Control · Mathematics 2021-12-28 Yujie Tang , Vikram Ramanathan , Junshan Zhang , Na Li

To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The…

Machine Learning · Computer Science 2024-02-05 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Congduan Li , Linqi Song

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine…

Machine Learning · Statistics 2020-01-06 Weiyu Li , Tianyi Chen , Liping Li , Zhaoxian Wu , Qing Ling

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic…

Machine Learning · Computer Science 2019-09-23 Shuheng Shen , Linli Xu , Jingchang Liu , Xianfeng Liang , Yifei Cheng

Recent developments on large-scale distributed machine learning applications, e.g., deep neural networks, benefit enormously from the advances in distributed non-convex optimization techniques, e.g., distributed Stochastic Gradient Descent…

Optimization and Control · Mathematics 2019-05-13 Hao Yu , Rong Jin , Sen Yang

Communication is a crucial phase in the context of distributed training. Because parameter server (PS) frequently experiences network congestion, recent studies have found that training paradigms without a centralized server outperform the…

Optimization and Control · Mathematics 2020-12-17 Feijie Wu , Shiqi He , Yutong Yang , Haozhao Wang , Zhihao Qu , Song Guo , Weihua Zhuang

Stochastic Gradient Descent (SGD) is the most popular algorithm for training deep neural networks (DNNs). As larger networks and datasets cause longer training times, training on distributed systems is common and distributed SGD variants,…

Machine Learning · Computer Science 2019-06-17 Kwangmin Yu , Thomas Flynn , Shinjae Yoo , Nicholas D'Imperio

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

With the rapid increase of big data, distributed Machine Learning (ML) has been widely applied in training large-scale models. Stochastic Gradient Descent (SGD) is arguably the workhorse algorithm of ML. Distributed ML models trained by SGD…

Machine Learning · Computer Science 2021-12-09 Keyu Yang , Lu Chen , Zhihao Zeng , Yunjun Gao

Distributed training is an effective way to accelerate the training process of large-scale deep learning models. However, the parameter exchange and synchronization of distributed stochastic gradient descent introduce a large amount of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-16 LingFei Dai , Boyu Diao , Chao Li , Yongjun Xu

Distributed training enables large-scale deep learning, but suffers from high communication overhead, especially as models and datasets grow. Gradient compression, particularly quantization, is a promising approach to mitigate this…

Machine Learning · Computer Science 2025-07-30 Jihao Xin , Marco Canini , Peter Richtárik , Samuel Horváth

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders…

Machine Learning · Computer Science 2018-11-30 Sebastian U. Stich , Jean-Baptiste Cordonnier , Martin Jaggi
‹ Prev 1 2 3 10 Next ›