English
Related papers

Related papers: Adaptive Gradient Quantization for Data-Parallel S…

200 papers

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach…

Machine Learning · Computer Science 2021-08-02 Guangfeng Yan , Shao-Lun Huang , Tian Lan , Linqi Song

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective…

Machine Learning · Computer Science 2021-02-10 Divyansh Jhunjhunwala , Advait Gadhikar , Gauri Joshi , Yonina C. Eldar

A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Kaixin Xu , Alina Hui Xiu Lee , Ziyuan Zhao , Zhe Wang , Min Wu , Weisi Lin

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-05 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression…

Machine Learning · Computer Science 2021-05-24 Ali Ramezani-Kebrya , Fartash Faghri , Ilya Markov , Vitalii Aksenov , Dan Alistarh , Daniel M. Roy

Stochastic gradient methods (SGMs) are the predominant approaches to train deep learning models. The adaptive versions (e.g., Adam and AMSGrad) have been extensively used in practice, partly because they achieve faster convergence than the…

Optimization and Control · Mathematics 2022-04-14 Yangyang Xu , Yibo Xu , Yonggui Yan , Colin Sutcher-Shepard , Leopold Grinberg , Jie Chen

Large neural networks require enormous computational clusters of machines. Model-parallel training, when the model architecture is partitioned sequentially between workers, is a popular approach for training modern models. Information…

Machine Learning · Computer Science 2024-03-27 Mikhail Rudakov , Aleksandr Beznosikov , Yaroslav Kholodov , Alexander Gasnikov

Distributed stochastic gradient descent (SGD) with gradient compression has become a popular communication-efficient solution for accelerating distributed learning. One commonly used method for gradient compression is Top-K sparsification,…

Machine Learning · Computer Science 2023-09-12 Mengzhe Ruan , Guangfeng Yan , Yuanzhang Xiao , Linqi Song , Weitao Xu

Mixed Precision Quantization (MPQ) has become an essential technique for optimizing neural network by determining the optimal bitwidth per layer. Existing MPQ methods, however, face a major hurdle: they require a computationally expensive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Lianbo Ma , Jianlun Ma , Yuee Zhou , Guoyang Xie , Qiang He , Zhichao Lu

Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, despite the nice property of fast convergence, have been observed to generalize worse than stochastic gradient descent (SGD)…

Machine Learning · Computer Science 2020-06-24 Jinghui Chen , Dongruo Zhou , Yiqi Tang , Ziyan Yang , Yuan Cao , Quanquan Gu

Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization…

Machine Learning · Computer Science 2021-09-08 Xiangyi Chen , Belhal Karimi , Weijie Zhao , Ping Li

We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Zhongnan Qu , Zimu Zhou , Yun Cheng , Lothar Thiele

Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, their practical deployment is limited by slow…

Machine Learning · Computer Science 2026-03-03 Dung Anh Hoang , Cuong Pham anh Trung Le , Jianfei Cai , Thanh-Toan Do

We study distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. Recent advances using the distributed gradient…

Optimization and Control · Mathematics 2019-05-14 Thinh T. Doan , Siva Theja Maguluri , Justin Romberg

We study COMP-AMS, a distributed optimization framework based on gradient averaging and adaptive AMSGrad algorithm. Gradient compression with error feedback is applied to reduce the communication cost in the gradient transmission process.…

Machine Learning · Statistics 2022-05-12 Xiaoyun Li , Belhal Karimi , Ping Li

Efficient classical optimizers are crucial in practical implementations of Variational Quantum Algorithms (VQAs). In particular, to make Stochastic Gradient Descent (SGD) resource efficient, adaptive strategies have been proposed to…

Quantum Physics · Physics 2023-02-10 Kosuke Ito

Deep neural networks with lower precision weights and operations at inference time have advantages in terms of the cost of memory space and accelerator power. The main challenge associated with the quantization algorithm is maintaining…

Computer Vision and Pattern Recognition · Computer Science 2022-02-21 Shih-Ting Lin , Zhaofang Li , Yu-Hsiang Cheng , Hao-Wen Kuo , Chih-Cheng Lu , Kea-Tiong Tang
‹ Prev 1 2 3 10 Next ›