Rotation Invariant Quantization for Model Compression

Joseph Kampeas; Yury Nahshan; Hanoch Kremer; Gil Lederman; Shira Zaloshinski; Zheng Li; Emir Haleva

Rotation Invariant Quantization for Model Compression

Machine Learning 2024-12-03 v3 Artificial Intelligence Information Theory math.IT

Authors: Joseph Kampeas , Yury Nahshan , Hanoch Kremer , Gil Lederman , Shira Zaloshinski , Zheng Li , Emir Haleva

Abstract

Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates $\times 19.4$ and $\times 52.9$ compression ratios on pre-trained VGG dense and pruned models, respectively, with $<0.4\%$ accuracy degradation. Code is available in \href{https://github.com/ehaleva/RIQ}{github.com/ehaleva/RIQ}.

Keywords

quantization image compression matrix factorization

Cite

@article{arxiv.2303.03106,
  title  = {Rotation Invariant Quantization for Model Compression},
  author = {Joseph Kampeas and Yury Nahshan and Hanoch Kremer and Gil Lederman and Shira Zaloshinski and Zheng Li and Emir Haleva},
  journal= {arXiv preprint arXiv:2303.03106},
  year   = {2024}
}

Comments

20 pages, 5 figures

Rotation Invariant Quantization for Model Compression

Abstract

Keywords

Cite

Comments

Related papers