Related papers: Communication-efficient Algorithm for Distributed …

Efficient Distributed Learning with Sparsity

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…

Machine Learning · Statistics 2016-05-26 Jialei Wang , Mladen Kolar , Nathan Srebro , Tong Zhang

Gradient Sparsification for Communication-Efficient Distributed Optimization

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as…

Machine Learning · Computer Science 2017-10-31 Jianqiao Wangni , Jialei Wang , Ji Liu , Tong Zhang

Distributed Learning with Sparse Communications by Identification

In distributed optimization for large-scale learning, a major performance limitation comes from the communications between the different entities. When computations are performed by workers on local data while a coordinator machine…

Optimization and Control · Mathematics 2020-06-26 Dmitry Grishchenko , Franck Iutzeler , Jérôme Malick , Massih-Reza Amini

Truncated Non-Uniform Quantization for Distributed SGD

To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The…

Machine Learning · Computer Science 2024-02-05 Guangfeng Yan , Tan Li , Yuanzhang Xiao , Congduan Li , Linqi Song

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional…

Machine Learning · Statistics 2017-05-10 Yuting Ma , Tian Zheng

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method used for distributed training of deep learning models. In this algorithm, each worker shares its local gradients with others and updates the parameters using the…

Machine Learning · Computer Science 2020-09-22 Negar Foroutan Eghlidi , Martin Jaggi

Communication Efficient LLM Pre-training with SparseLoCo

Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across datacenters and over the…

Machine Learning · Computer Science 2025-11-07 Amir Sarfi , Benjamin Thérien , Joel Lidin , Eugene Belilovsky

Communication-efficient Variance-reduced Stochastic Gradient Descent

We consider the problem of communication efficient distributed optimization where multiple nodes exchange important algorithm information in every iteration to solve large problems. In particular, we focus on the stochastic variance-reduced…

Machine Learning · Computer Science 2020-03-16 Hossein S. Ghadikolaei , Sindri Magnusson

An efficient distributed learning algorithm based on effective local functional approximations

Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…

Machine Learning · Computer Science 2015-03-18 Dhruv Mahajan , Nikunj Agrawal , S. Sathiya Keerthi , S. Sundararajan , Leon Bottou

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed optimization algorithms, i.e. algorithms that leverage the compute power of many devices for training. The communication overhead is a key bottleneck that hinders…

Machine Learning · Computer Science 2018-11-30 Sebastian U. Stich , Jean-Baptiste Cordonnier , Martin Jaggi

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or…

Machine Learning · Computer Science 2020-02-25 Zhenheng Tang , Shaohuai Shi , Xiaowen Chu

Sparse Online Learning via Truncated Gradient

We propose a general method called truncated gradient to induce sparsity in the weights of online learning algorithms with convex loss functions. This method has several essential properties: The degree of sparsity is continuous -- a…

Machine Learning · Computer Science 2008-07-04 John Langford , Lihong Li , Tong Zhang

Debiased distributed learning for sparse partial linear models in high dimensions

Although various distributed machine learning schemes have been proposed recently for pure linear models and fully nonparametric models, little attention has been paid on distributed optimization for semi-paramemetric models with…

Machine Learning · Statistics 2019-11-05 Shaogao Lv , Heng Lian

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…

Machine Learning · Computer Science 2024-07-10 Fred Lu , Ryan R. Curtin , Edward Raff , Francis Ferraro , James Holt

Distributed Sparse Linear Regression under Communication Constraints

In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and…

Machine Learning · Computer Science 2026-01-05 Rodney Fonseca , Boaz Nadler

Theory of Dual-sparse Regularized Randomized Reduction

In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification.…

Machine Learning · Computer Science 2015-07-21 Tianbao Yang , Lijun Zhang , Rong Jin , Shenghuo Zhu

SparDL: Distributed Deep Learning Training with Efficient Sparse Communication

Top-k sparsification has recently been widely used to reduce the communication volume in distributed deep learning. However, due to the Sparse Gradient Accumulation (SGA) dilemma, the performance of top-k sparsification still has…

Machine Learning · Computer Science 2024-02-26 Minjun Zhao , Yichen Yin , Yuren Mao , Qing Liu , Lu Chen , Yunjun Gao

Decentralized Learning with Lazy and Approximate Dual Gradients

This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on…

Optimization and Control · Mathematics 2020-08-06 Yanli Liu , Yuejiao Sun , Wotao Yin

Quantum Algorithm for Sparse Online Learning with Truncated Gradient Descent

Logistic regression, the Support Vector Machine (SVM), and least squares are well-studied methods in the statistical and computer science community, with various practical applications. High-dimensional data arriving on a real-time basis…

Machine Learning · Computer Science 2024-11-07 Debbie Lim , Yixian Qiu , Patrick Rebentrost , Qisheng Wang