English

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Computer Vision and Pattern Recognition 2025-07-16 v6

Abstract

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit quantization efficiently. In this paper, we identify imbalanced activation distributions as a primary source of quantization difficulty, and propose to adjust these distributions through weight finetuning to be more quantization-friendly. We provide both theoretical and empirical evidence supporting finetuning as a practical and reliable solution. Building on this approach, we further distinguish two critical types of quantized layers: those responsible for retaining essential temporal information and those particularly sensitive to bit-width reduction. By selectively finetuning these layers under both local and global supervision, we mitigate performance degradation while enhancing quantization efficiency. Our method demonstrates its efficacy across three high-resolution image generation tasks, obtaining state-of-the-art performance across multiple bit-width settings.

Keywords

Cite

@article{arxiv.2402.03666,
  title  = {QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning},
  author = {Haoxuan Wang and Yuzhang Shang and Zhihang Yuan and Junyi Wu and Junchi Yan and Yan Yan},
  journal= {arXiv preprint arXiv:2402.03666},
  year   = {2025}
}

Comments

ICCV 2025. Code is available at https://github.com/hatchetProject/QuEST