Related papers: Distributed Proximal Gradient Algorithm for Partia…

S-NEAR-DGD: A Flexible Distributed Stochastic Gradient Method for Inexact Communication

We present and analyze a stochastic distributed method (S-NEAR-DGD) that can tolerate inexact computation and inaccurate information exchange to alleviate the problems of costly gradient evaluations and bandwidth-limited communication in…

Optimization and Control · Mathematics 2021-02-02 Charikleia Iakovidou , Ermin Wei

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-11 Xiaoge Deng , Dongsheng Li , Tao Sun , Xicheng Lu

A Model Parallel Proximal Stochastic Gradient Algorithm for Partially Asynchronous Systems

Large models are prevalent in modern machine learning scenarios, including deep learning, recommender systems, etc., which can have millions or even billions of parameters. Parallel algorithms have become an essential solution technique to…

Machine Learning · Computer Science 2018-10-23 Rui Zhu , Di Niu

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal…

Optimization and Control · Mathematics 2016-05-24 Yitan Li , Linli Xu , Xiaowei Zhong , Qing Ling

A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks

In distributed training of deep neural networks, people usually run Stochastic Gradient Descent (SGD) or its variants on each machine and communicate with other machines periodically. However, SGD might converge slowly in training some deep…

Machine Learning · Computer Science 2022-10-14 Mingrui Liu , Zhenxun Zhuang , Yunwei Lei , Chunyang Liao

On the Convergence of Inexact Gradient Descent with Controlled Synchronization Steps

We develop a gradient-like algorithm to minimize a sum of peer objective functions based on coordination through a peer interconnection network. The coordination admits two stages: the first is to constitute a gradient, possibly with…

Optimization and Control · Mathematics 2023-07-19 Sandushan Ranaweera , Chathuranga Weeraddana , Prathapasinghe Dharmawansa , Carlo Fischione

DSPG: Decentralized Simultaneous Perturbations Gradient Descent Scheme

Distributed descent-based methods are an essential toolset to solving optimization problems in multi-agent system scenarios. Here the agents seek to optimize a global objective function through mutual cooperation. Oftentimes, cooperation is…

Optimization and Control · Mathematics 2019-08-28 Arunselvan Ramaswamy

A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm

We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective writes a sum of smooth functions, local to each worker, and a non-smooth function. Unlike many existing methods, our distributed…

Optimization and Control · Mathematics 2019-12-13 Konstantin Mishchenko , Franck Iutzeler , Jérôme Malick

Communication-Efficient Robust Federated Learning Over Heterogeneous Datasets

This work investigates fault-resilient federated learning when the data samples are non-uniformly distributed across workers, and the number of faulty workers is unknown to the central server. In the presence of adversarially faulty workers…

Machine Learning · Computer Science 2020-08-20 Yanjie Dong , Georgios B. Giannakis , Tianyi Chen , Julian Cheng , Md. Jahangir Hossain , Victor C. M. Leung

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Stochastic Proximal Gradient Consensus Over Random Networks

We consider solving a convex, possibly stochastic optimization problem over a randomly time-varying multi-agent network. Each agent has access to some local objective function, and it only has unbiased estimates of the gradients of the…

Optimization and Control · Mathematics 2016-11-29 Mingyi Hong , Tsung-Hui Chang

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization

Prompt engineering is crucial for fully leveraging large language models (LLMs), yet most existing optimization methods follow a single trajectory, resulting in limited adaptability, gradient conflicts, and high computational overhead. We…

Artificial Intelligence · Computer Science 2026-02-04 Yichen Han , Yuhang Han , Siteng Huang , Guanyu Liu , Zhengpeng Zhou , Bojun Liu , Yujia Zhang , Isaac N Shi , Lewei He , Tianyu Shi

Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG) for solving decentralized strongly convex stochastic optimization problems where the objective function is distributed over several computational…

Optimization and Control · Mathematics 2021-10-05 Alireza Fallah , Mert Gurbuzbalaban , Asuman Ozdaglar , Umut Simsekli , Lingjiong Zhu

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity…

Machine Learning · Computer Science 2024-08-13 Junchi Yang , Murat Yildirim , Qiu Feng

High Throughput Synchronous Distributed Stochastic Gradient Descent

We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochastic-gradient-descent learning algorithm. This algorithm uses amortized inference in a compute-cluster-specific, deep, generative, dynamical model to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-14 Michael Teng , Frank Wood

On the convergence rate of distributed gradient methods for finite-sum optimization under communication delays

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems,…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Carolyn L. Beck , R. Srikant

Communication-Efficient Approximate Gradient Coding

Large-scale distributed learning aims at minimizing a loss function $L$ that depends on a training dataset with respect to a $d$-length parameter vector. The distributed cluster typically consists of a parameter server (PS) and multiple…

Information Theory · Computer Science 2026-03-25 Sifat Munim , Aditya Ramamoorthy

Randomized Constraints Consensus for Distributed Robust Mixed-Integer Programming

In this paper, we consider a network of processors aiming at cooperatively solving mixed-integer convex programs subject to uncertainty. Each node only knows a common cost function and its local uncertain constraint set. We propose a…

Optimization and Control · Mathematics 2022-07-19 Mohammadreza Chamanbaz , Giuseppe Notarstefano , Francesco Sasso , Roland Bouffanais

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-25 Dan Alistarh , Christopher De Sa , Nikola Konstantinov

Delay-tolerant distributed Bregman proximal algorithms

Many problems in machine learning write as the minimization of a sum of individual loss functions over the training examples. These functions are usually differentiable but, in some cases, their gradients are not Lipschitz continuous, which…

Optimization and Control · Mathematics 2024-04-29 S. Chraibi , F. Iutzeler , J. Malick , A. Rogozin