English
Related papers

Related papers: Trajectory Normalized Gradients for Distributed Op…

200 papers

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba

This paper proposes a prediction-based gradient compression method for distributed learning with event-triggered communication. Our goal is to reduce the amount of information transmitted from the distributed agents to the parameter server…

Information Theory · Computer Science 2024-10-04 Adrian Edin , Zheng Chen , Michel Kieffer , Mikael Johansson

Asynchronous computation and gradient compression have emerged as two key techniques for achieving scalability in distributed optimization for large-scale machine learning. This paper presents a unified analysis framework for distributed…

Optimization and Control · Mathematics 2018-11-30 Sarit Khirirat , Hamid Reza Feyzmahdavian , Mikael Johansson

Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent…

Optimization and Control · Mathematics 2022-05-26 Yiwei Liao , Zhuorui Li , Kun Huang , Shi Pu

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent…

Machine Learning · Computer Science 2020-10-08 Alyazeed Albasyoni , Mher Safaryan , Laurent Condat , Peter Richtárik

Achieving communication efficiency in decentralized machine learning has been attracting significant attention, with communication compression recognized as an effective technique in algorithm design. This paper takes a first step to…

Machine Learning · Computer Science 2023-05-18 Boyue Li , Yuejie Chi

Training large neural networks is time consuming. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient…

Machine Learning · Computer Science 2022-10-03 William Zou , Hans De Sterck , Jun Liu

We propose tensorial neural networks (TNNs), a generalization of existing neural networks by extending tensor operations on low order operands to those on high order ones. The problem of parameter learning is challenging, as it corresponds…

Machine Learning · Statistics 2018-12-11 Jiahao Su , Jingling Li , Bobby Bhattacharjee , Furong Huang

Compression techniques are essential in distributed optimization and learning algorithms with high-dimensional model parameters, particularly in scenarios with tight communication constraints such as limited bandwidth. This article presents…

Optimization and Control · Mathematics 2025-10-27 Souvik Das , Subhrakanti Dey

Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification,…

Machine Learning · Computer Science 2020-11-02 Saurabh Agarwal , Hongyi Wang , Kangwook Lee , Shivaram Venkataraman , Dimitris Papailiopoulos

Gradient tracking methods have emerged as one of the most popular approaches for solving decentralized optimization problems over networks. In this setting, each node in the network has a portion of the global objective function, and the…

Optimization and Control · Mathematics 2023-11-27 Albert S. Berahas , Raghu Bollapragada , Shagun Gupta

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each…

Machine Learning · Computer Science 2018-08-09 Jeremy Bernstein , Yu-Xiang Wang , Kamyar Azizzadenesheli , Anima Anandkumar

We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save…

Optimization and Control · Mathematics 2023-03-01 Yonggui Yan , Jie Chen , Pin-Yu Chen , Xiaodong Cui , Songtao Lu , Yangyang Xu

The goal of this thesis is to study the compression problems arising in distributed computing systematically. In the first part of the thesis, we study gradient compression for distributed first-order optimization. We begin by establishing…

Information Theory · Computer Science 2023-01-12 Prathamesh Mayekar

In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated…

Machine Learning · Computer Science 2021-11-16 Zhong-Jing Chen , Eduin E. Hernandez , Yu-Chih Huang , Stefano Rini

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this…

Machine Learning · Computer Science 2021-09-10 Osama A. Hanna , Yahya H. Ezzeldin , Christina Fragouli , Suhas Diggavi

Communication overhead severely hinders the scalability of distributed machine learning systems. Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-19 Yuchen Zhong , Cong Xie , Shuai Zheng , Haibin Lin

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several…

Machine Learning · Computer Science 2021-03-18 Lusine Abrahamyan , Yiming Chen , Giannis Bekoulis , Nikos Deligiannis

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-10 Linnan Wang , Wei Wu , Junyu Zhang , Hang Liu , George Bosilca , Maurice Herlihy , Rodrigo Fonseca

In this paper, we study the distributed nonconvex optimization problem, which aims to minimize the average value of the local nonconvex cost functions using local information exchange. To reduce the communication overhead, we introduce…

Optimization and Control · Mathematics 2025-02-12 Lei Xu , Xinlei Yi , Guanghui Wen , Yang Shi , Karl H. Johansson , Tao Yang
‹ Prev 1 2 3 10 Next ›