Related papers: Robust and Communication-Efficient Collaborative L…
Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, existing methods often suffer from communication…
Decentralized learning algorithms empower interconnected devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator. In the case of heterogeneous data…
We consider the problem of decentralized consensus optimization, where the sum of $n$ smooth and strongly convex functions are minimized over $n$ distributed agents that form a connected network. In particular, we consider the case that the…
Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform…
Decentralized optimization with time-varying networks is an emerging paradigm in machine learning. It saves remarkable communication overhead in large-scale deep training and is more robust in wireless scenarios especially when nodes are…
This paper develops algorithms for decentralized machine learning over a network, where data are distributed, computation is localized, and communication is restricted between neighbors. A line of recent research in this area focuses on…
In this paper, we consider a large network containing many regions such that each region is equipped with a worker with some data processing and communication capability. For such a network, some workers may become stragglers due to the…
Decentralized optimization is a promising parallel computation paradigm for large-scale data analytics and machine learning problems defined over a network of nodes. This paper is concerned with decentralized non-convex composite problems…
We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. This imperfect communication poses a fundamental…
One of the most widely used methods for solving large-scale stochastic optimization problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent…
Decentralized stochastic gradient descent (SGD) is a driving engine for decentralized federated learning (DFL). The performance of decentralized SGD is jointly influenced by inter-node communications and local updates. In this paper, we…
Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…
Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over…
Decentralized optimization is typically studied under the assumption of noise-free transmission. However, real-world scenarios often involve the presence of noise due to factors such as additive white Gaussian noise channels or…
We consider large scale distributed optimization over a set of edge devices connected to a central server, where the limited communication bandwidth between the server and edge devices imposes a significant bottleneck for the optimization…
In this paper, we propose to solve a regularized distributionally robust learning problem in the decentralized setting, taking into account the data distribution shift. By adding a Kullback-Liebler regularization function to the robust…
Decentralized learning enables edge users to collaboratively train models by exchanging information via device-to-device communication, yet prior works have been limited to wireless networks with fixed topologies and reliable workers. In…
Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…