Related papers: Position-based Scaled Gradient for Model Quantizat…

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve…

Machine Learning · Computer Science 2020-02-19 Thijs Vogels , Sai Praneeth Karimireddy , Martin Jaggi

Learning Multimodal Fixed-Point Weights using Gradient Descent

Due to their high computational complexity, deep neural networks are still limited to powerful processing units. To promote a reduced model complexity by dint of low-bit fixed-point quantization, we propose a gradient-based optimization…

Machine Learning · Computer Science 2019-07-18 Lukas Enderich , Fabian Timm , Lars Rosenbaum , Wolfram Burgard

Bounded perturbation resilience of projected scaled gradient methods

We investigate projected scaled gradient (PSG) methods for convex minimization problems. These methods perform a descent step along a diagonally scaled gradient direction followed by a feasibility regaining step via orthogonal projection…

Optimization and Control · Mathematics 2015-07-28 W. Jin , Y. Censor , M. Jiang

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck…

Machine Learning · Computer Science 2021-05-24 Dmitrii Avdiukhin , Grigory Yaroslavtsev

Online Statistical Inference for Parameters Estimation with Linear-Equality Constraints

Stochastic gradient descent (SGD) and projected stochastic gradient descent (PSGD) are scalable algorithms to compute model parameters in unconstrained and constrained optimization problems. In comparison with SGD, PSGD forces its iterative…

Machine Learning · Statistics 2022-03-24 Ruiqi Liu , Mingao Yuan , Zuofeng Shang

Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement

While pruning methods effectively maintain model performance without extra training costs, they often focus solely on preserving crucial connections, overlooking the impact of pruned weights on subsequent fine-tuning or distillation,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Hyeonjin Kim , Jaejun Yoo

On the Convergence of A Data-Driven Regularized Stochastic Gradient Descent for Nonlinear Ill-Posed Problems

Stochastic gradient descent (SGD) is a promising method for solving large-scale inverse problems, due to its excellent scalability with respect to data size. In this work, we analyze a new data-driven regularized stochastic gradient descent…

Numerical Analysis · Mathematics 2024-09-30 Zehui Zhou

Quickly Finding the Best Linear Model in High Dimensions

We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population…

Machine Learning · Computer Science 2021-02-09 Yahya Sattar , Samet Oymak

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

SQS: Bayesian DNN Compression through Sparse Quantized Sub-distributions

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song

A Granger-Causal Perspective on Gradient Descent with Application to Pruning

Stochastic Gradient Descent (SGD) is the main approach to optimizing neural networks. Several generalization properties of deep networks, such as convergence to a flatter minima, are believed to arise from SGD. This article explores the…

Machine Learning · Computer Science 2024-12-05 Aditya Shah , Aditya Challa , Sravan Danda , Archana Mathur , Snehanshu Saha

Shape Guided Gradient Voting for Domain Generalization

Domain generalization aims to address the domain shift between training and testing data. To learn the domain invariant representations, the model is usually trained on multiple domains. It has been found that the gradients of network…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Jiaqi Xu , Yuwang Wang , Xuejin Chen

Scaling Private Deep Learning with Low-Rank and Sparse Gradients

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration…

Machine Learning · Computer Science 2022-07-07 Ryuichi Ito , Seng Pei Liew , Tsubasa Takahashi , Yuya Sasaki , Makoto Onizuka

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs…

Machine Learning · Statistics 2023-04-21 Ricardo Bigolin Lanfredi , Joyce D. Schroeder , Tolga Tasdizen

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

Adaptive Sketches for Robust Regression with Importance Sampling

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale…

Machine Learning · Computer Science 2022-07-19 Sepideh Mahabadi , David P. Woodruff , Samson Zhou

Sequential Policy Gradient for Adaptive Hyperparameter Optimization

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Shifted Partial Derivative Polynomial Rank and Codimension

Shifted partial derivative (SPD) methods are a central algebraic tool for circuit lower bounds, measuring the dimension of spaces of shifted derivatives of a polynomial. We develop the Shifted Partial Derivative Polynomial (SPDP) framework,…

Computational Complexity · Computer Science 2025-12-25 Darren J. Edwards

Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight

In the domain of deep learning, the challenge of protecting sensitive data while maintaining model utility is significant. Traditional Differential Privacy (DP) techniques such as Differentially Private Stochastic Gradient Descent (DP-SGD)…

Machine Learning · Computer Science 2024-11-06 Tao Huang , Qingyu Huang , Xin Shi , Jiayang Meng , Guolong Zheng , Xu Yang , Xun Yi

SPDY: Accurate Pruning with Speedup Guarantees

The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational…

Machine Learning · Computer Science 2022-08-25 Elias Frantar , Dan Alistarh