English
Related papers

Related papers: Position-based Scaled Gradient for Model Quantizat…

200 papers

We study gradient compression methods to alleviate the communication bottleneck in data-parallel distributed optimization. Despite the significant attention received, current compression schemes either do not scale well or fail to achieve…

Machine Learning · Computer Science 2020-02-19 Thijs Vogels , Sai Praneeth Karimireddy , Martin Jaggi

Due to their high computational complexity, deep neural networks are still limited to powerful processing units. To promote a reduced model complexity by dint of low-bit fixed-point quantization, we propose a gradient-based optimization…

Machine Learning · Computer Science 2019-07-18 Lukas Enderich , Fabian Timm , Lars Rosenbaum , Wolfram Burgard

We investigate projected scaled gradient (PSG) methods for convex minimization problems. These methods perform a descent step along a diagonally scaled gradient direction followed by a feasibility regaining step via orthogonal projection…

Optimization and Control · Mathematics 2015-07-28 W. Jin , Y. Censor , M. Jiang

Stochastic gradient descent (SGD) is a prevalent optimization technique for large-scale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck…

Machine Learning · Computer Science 2021-05-24 Dmitrii Avdiukhin , Grigory Yaroslavtsev

Stochastic gradient descent (SGD) and projected stochastic gradient descent (PSGD) are scalable algorithms to compute model parameters in unconstrained and constrained optimization problems. In comparison with SGD, PSGD forces its iterative…

Machine Learning · Statistics 2022-03-24 Ruiqi Liu , Mingao Yuan , Zuofeng Shang

While pruning methods effectively maintain model performance without extra training costs, they often focus solely on preserving crucial connections, overlooking the impact of pruned weights on subsequent fine-tuning or distillation,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Hyeonjin Kim , Jaejun Yoo

Stochastic gradient descent (SGD) is a promising method for solving large-scale inverse problems, due to its excellent scalability with respect to data size. In this work, we analyze a new data-driven regularized stochastic gradient descent…

Numerical Analysis · Mathematics 2024-09-30 Zehui Zhou

We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population…

Machine Learning · Computer Science 2021-02-09 Yahya Sattar , Samet Oymak

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle…

Machine Learning · Computer Science 2022-03-31 S Vineeth

Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…

Machine Learning · Computer Science 2025-10-13 Ziyi Wang , Nan Jiang , Guang Lin , Qifan Song

Stochastic Gradient Descent (SGD) is the main approach to optimizing neural networks. Several generalization properties of deep networks, such as convergence to a flatter minima, are believed to arise from SGD. This article explores the…

Machine Learning · Computer Science 2024-12-05 Aditya Shah , Aditya Challa , Sravan Danda , Archana Mathur , Snehanshu Saha

Domain generalization aims to address the domain shift between training and testing data. To learn the domain invariant representations, the model is usually trained on multiple domains. It has been found that the gradients of network…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Jiaqi Xu , Yuwang Wang , Xuejin Chen

Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration…

Machine Learning · Computer Science 2022-07-07 Ryuichi Ito , Seng Pei Liew , Tsubasa Takahashi , Yuya Sasaki , Makoto Onizuka

Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs…

Machine Learning · Statistics 2023-04-21 Ricardo Bigolin Lanfredi , Joyce D. Schroeder , Tolga Tasdizen

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks.…

Machine Learning · Computer Science 2017-12-07 Dan Alistarh , Demjan Grubic , Jerry Li , Ryota Tomioka , Milan Vojnovic

We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale…

Machine Learning · Computer Science 2022-07-19 Sepideh Mahabadi , David P. Woodruff , Samson Zhou

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token…

Machine Learning · Computer Science 2025-06-19 Zheng Li , Jerry Cheng , Huanying Helen Gu

Shifted partial derivative (SPD) methods are a central algebraic tool for circuit lower bounds, measuring the dimension of spaces of shifted derivatives of a polynomial. We develop the Shifted Partial Derivative Polynomial (SPDP) framework,…

Computational Complexity · Computer Science 2025-12-25 Darren J. Edwards

In the domain of deep learning, the challenge of protecting sensitive data while maintaining model utility is significant. Traditional Differential Privacy (DP) techniques such as Differentially Private Stochastic Gradient Descent (DP-SGD)…

Machine Learning · Computer Science 2024-11-06 Tao Huang , Qingyu Huang , Xin Shi , Jiayang Meng , Guolong Zheng , Xu Yang , Xun Yi

The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational…

Machine Learning · Computer Science 2022-08-25 Elias Frantar , Dan Alistarh
‹ Prev 1 2 3 10 Next ›