Related papers: Distributed Optimization over Block-Cyclic Data

Stochastic Distributed Optimization for Machine Learning from Decentralized Features

Distributed machine learning has been widely studied in the literature to scale up machine learning model training in the presence of an ever-increasing amount of data. We study distributed machine learning from another perspective, where…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-16 Yaochen Hu , Di Niu , Jianming Yang , Shengping Zhou

Cooperative SGD with Dynamic Mixing Matrices

One of the most common methods to train machine learning algorithms today is the stochastic gradient descent (SGD). In a distributed setting, SGD-based algorithms have been shown to converge theoretically under specific circumstances. A…

Machine Learning · Computer Science 2025-08-22 Soumya Sarkar , Shweta Jain

Distributed Stochastic Optimization via Adaptive SGD

Stochastic convex optimization algorithms are the most popular way to train machine learning models on large-scale data. Scaling up the training process of these models is crucial, but the most popular algorithm, Stochastic Gradient Descent…

Machine Learning · Statistics 2018-10-30 Ashok Cutkosky , Robert Busa-Fekete

99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it

Many popular distributed optimization methods for training machine learning models fit the following template: a local gradient estimate is computed independently by each worker, then communicated to a master, which subsequently performs…

Machine Learning · Computer Science 2019-06-05 Konstantin Mishchenko , Filip Hanzely , Peter Richtárik

Weighted parallel SGD for distributed unbalanced-workload training system

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal…

Machine Learning · Computer Science 2017-08-17 Cheng Daning , Li Shigang , Zhang Yunquan

RCD-SGD: Resource-Constrained Distributed SGD in Heterogeneous Environment via Submodular Partitioning

The convergence of SGD based distributed training algorithms is tied to the data distribution across workers. Standard partitioning techniques try to achieve equal-sized partitions with per-class population distribution in proportion to the…

Machine Learning · Computer Science 2023-09-20 Haoze He , Parijat Dube

Optimization for Large-Scale Machine Learning with Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2017-04-18 Alexandros Nathan , Diego Klabjan

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms

Recently, there is a growing interest in the study of median-based algorithms for distributed non-convex optimization. Two prominent such algorithms include signSGD with majority vote, an effective approach for communication reduction via…

Machine Learning · Computer Science 2019-06-07 Xiangyi Chen , Tiancong Chen , Haoran Sun , Zhiwei Steven Wu , Mingyi Hong

Fast and Straggler-Tolerant Distributed SGD with Reduced Computation Load

In distributed machine learning, a central node outsources computationally expensive calculations to external worker nodes. The properties of optimization procedures like stochastic gradient descent (SGD) can be leveraged to mitigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Maximilian Egger , Serge Kas Hanna , Rawad Bitar

Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training

Distributed asynchronous offline training has received widespread attention in recent years because of its high performance on large-scale data and complex models. As data are distributed from cloud-centric to edge nodes, a big challenge…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-03 Chengjie Li , Ruixuan Li , Haozhao Wang , Yuhua Li , Pan Zhou , Song Guo , Keqin Li

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension

We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by…

Machine Learning · Computer Science 2022-04-29 Yunfei Teng , Wenbo Gao , Francois Chalus , Anna Choromanska , Donald Goldfarb , Adrian Weller

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks with advanced algebraic capabilities both in industry and academia. Many of these frameworks, e.g., TensorFlow and BIDMach, implement their compute-intensive primitives in…

Databases · Computer Science 2018-02-27 Yujing Ma , Florin Rusu , Martin Torres

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-14 Aditya Devarakonda , Ramakrishnan Kannan

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems,there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However,…

Machine Learning · Computer Science 2015-12-07 Ruiliang Zhang , Shuai Zheng , James T. Kwok

Does Worst-Performing Agent Lead the Pack? Analyzing Agent Dynamics in Unified Distributed SGD

Distributed learning is essential to train machine learning algorithms across heterogeneous agents while maintaining data privacy. We conduct an asymptotic analysis of Unified Distributed SGD (UD-SGD), exploring a variety of communication…

Machine Learning · Computer Science 2024-10-30 Jie Hu , Yi-Ting Ma , Do Young Eun

Distributed stochastic optimization with large delays

One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…

Optimization and Control · Mathematics 2021-07-08 Zhengyuan Zhou , Panayotis Mertikopoulos , Nicholas Bambos , Peter W. Glynn , Yinyu Ye

Federated Optimization:Distributed Optimization Beyond the Datacenter

We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely large number of \nodes, but the goal remains to…

Machine Learning · Computer Science 2015-11-12 Jakub Konečný , Brendan McMahan , Daniel Ramage

Semi-Cyclic Stochastic Gradient Descent

We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in…

Machine Learning · Computer Science 2019-04-24 Hubert Eichner , Tomer Koren , H. Brendan McMahan , Nathan Srebro , Kunal Talwar

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization

In modern large-scale machine learning applications, the training data are often partitioned and stored on multiple machines. It is customary to employ the "data parallelism" approach, where the aggregated training loss is minimized without…

Machine Learning · Computer Science 2017-08-28 Shun Zheng , Jialei Wang , Fen Xia , Wei Xu , Tong Zhang