English
Related papers

Related papers: Quantized Adam with Error Feedback

200 papers

Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding $\varepsilon$-stationary points has…

Machine Learning · Computer Science 2023-08-25 Congliang Chen , Li Shen , Wei Liu , Zhi-Quan Luo

Training generative adversarial networks (GAN) in a distributed fashion is a promising technology since it is contributed to training GAN on a massive of data efficiently in real-world applications. However, GAN is known to be difficult to…

Machine Learning · Computer Science 2020-10-27 Xiaojun Chen , Shu Yang , Li Shen , Xuanrong Pang

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

Due to the explosion in the size of the training datasets, distributed learning has received growing interest in recent years. One of the major bottlenecks is the large communication cost between the central server and the local workers.…

Machine Learning · Computer Science 2022-02-25 Yujia Wang , Lu Lin , Jinghui Chen

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks due to their ability to achieve faster convergence. However, these methods often suffer from…

Machine Learning · Computer Science 2025-02-12 Abulikemu Abuduweili , Changliu Liu

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay. In our experiments on neural networks for image classification, speech recognition, machine translation,…

Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the…

Computer Vision and Pattern Recognition · Computer Science 2018-06-22 Jiaxiang Wu , Weidong Huang , Junzhou Huang , Tong Zhang

Adaptive gradient methods, e.g. \textsc{Adam}, have achieved tremendous success in machine learning. Scaling the learning rate element-wisely by a certain form of second moment estimate of gradients, such methods are able to attain rapid…

Machine Learning · Computer Science 2022-02-10 Yizhou Wang , Yue Kang , Can Qin , Huan Wang , Yi Xu , Yulun Zhang , Yun Fu

We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this…

Machine Learning · Computer Science 2021-09-10 Osama A. Hanna , Yahya H. Ezzeldin , Christina Fragouli , Suhas Diggavi

Stochastic gradient descent (SGD) is the main approach for training deep networks: it moves towards the optimum of the cost function by iteratively updating the parameters of a model in the direction of the gradient of the loss evaluated on…

Machine Learning · Computer Science 2021-03-30 Loris Nanni , Gianluca Maguolo , Alessandra Lumini

Distributed data mining is an emerging research topic to effectively and efficiently address hard data mining tasks using big data, which are partitioned and computed on different worker nodes, instead of one centralized server.…

Machine Learning · Computer Science 2022-10-17 Wenhan Xian , Feihu Huang , Heng Huang

Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing due to the capability on…

Machine Learning · Computer Science 2020-12-17 Bingxin Zhou , Xuebin Zheng , Junbin Gao

Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a…

Machine Learning · Computer Science 2018-06-12 Hiroaki Hayashi , Jayanth Koushik , Graham Neubig

In this paper, we design a novel distributed learning algorithm using stochastic compressed communications. In detail, we pursue a modular approach, merging ADMM and a gradient-based approach, benefiting from the robustness of the former…

Optimization and Control · Mathematics 2025-07-01 Guido Carnevale , Nicola Bastianello

Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective…

Machine Learning · Computer Science 2021-02-10 Divyansh Jhunjhunwala , Advait Gadhikar , Gauri Joshi , Yonina C. Eldar

A standard approach in large scale machine learning is distributed stochastic gradient training, which requires the computation of aggregated stochastic gradients over multiple nodes on a network. Communication is a major bottleneck in such…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-24 Hanlin Tang , Xiangru Lian , Chen Yu , Tong Zhang , Ji Liu

Adam is a popular and widely used adaptive gradient method in deep learning, which has also received tremendous focus in theoretical research. However, most existing theoretical work primarily analyzes its full-batch version, which differs…

Machine Learning · Computer Science 2025-10-14 Xuan Tang , Han Zhang , Yuan Cao , Difan Zou

Due to the substantial computational cost, training state-of-the-art deep neural networks for large-scale datasets often requires distributed training using multiple computation workers. However, by nature, workers need to frequently…

Machine Learning · Computer Science 2018-02-21 Yusuke Tsuzuku , Hiroto Imachi , Takuya Akiba
‹ Prev 1 2 3 10 Next ›