English

Position-based Scaled Gradient for Model Quantization and Pruning

Computer Vision and Pattern Recognition 2020-11-12 v4 Machine Learning

Abstract

We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to a weight vector is favorable for model compression domains such as quantization and pruning. PSG reduces the gap between the weight distributions of a full-precision model and its compressed counterpart. This enables the versatile deployment of a model either as an uncompressed mode or as a compressed mode depending on the availability of resources. The experimental results on CIFAR-10/100 and ImageNet datasets show the effectiveness of the proposed PSG in both domains of pruning and quantization even for extremely low bits. The code is released in Github.

Cite

@article{arxiv.2005.11035,
  title  = {Position-based Scaled Gradient for Model Quantization and Pruning},
  author = {Jangho Kim and KiYoon Yoo and Nojun Kwak},
  journal= {arXiv preprint arXiv:2005.11035},
  year   = {2020}
}

Comments

Advances in Neural Information Processing Systems

R2 v1 2026-06-23T15:44:01.261Z