Related papers: Low-Bitwidth Floating Point Quantization for Effic…

Q-Diffusion: Quantizing Diffusion Models

Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Xiuyu Li , Yijiang Liu , Long Lian , Huanrui Yang , Zhen Dong , Daniel Kang , Shanghang Zhang , Kurt Keutzer

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit…

Computer Vision and Pattern Recognition · Computer Science 2025-07-16 Haoxuan Wang , Yuzhang Shang , Zhihang Yuan , Junyi Wu , Junchi Yan , Yan Yan

FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers

Diffusion Models (DM) have revolutionized the text-to-image visual generation process. However, the large computational cost and model footprint of DMs hinders practical deployment, especially on edge devices. Post-training quantization…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Ruichen Chen , Keith G. Mills , Di Niu

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

Diffusion Transformers (DiTs) have recently gained substantial attention in both industrial and academic fields for their superior visual generation capabilities, outperforming traditional diffusion models that use U-Net. However,the…

Computer Vision and Pattern Recognition · Computer Science 2024-06-03 Wenxuan Liu , Sai Qian Zhang

Effective Quantization for Diffusion Models on CPUs

Diffusion models have gained popularity for generating images from textual descriptions. Nonetheless, the substantial need for computational resources continues to present a noteworthy challenge, contributing to time-consuming processes.…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Hanwen Chang , Haihao Shen , Yiyang Cai , Xinyu Ye , Zhenzhong Xu , Wenhua Cheng , Kaokao Lv , Weiwei Zhang , Yintong Lu , Heng Guo

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Model quantization reduces the bit-width of weights and activations, improving memory efficiency and inference speed in diffusion models. However, achieving 4-bit quantization remains challenging. Existing methods, primarily based on…

Machine Learning · Computer Science 2025-05-29 Maosen Zhao , Pengtao Chen , Chong Yu , Yan Wen , Xudong Tan , Tao Chen

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent…

Computer Vision and Pattern Recognition · Computer Science 2024-05-31 Tianchen Zhao , Xuefei Ning , Tongcheng Fang , Enshu Liu , Guyue Huang , Zinan Lin , Shengen Yan , Guohao Dai , Yu Wang

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for real-world applications is constrained by substantial computational costs and latency issues.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Yefei He , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Shivam Aggarwal , Hans Jakob Damsgaard , Alessandro Pappalardo , Giuseppe Franco , Thomas B. Preußer , Michaela Blott , Tulika Mitra

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Low-bit Model Quantization for Deep Neural Networks: A Survey

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an…

Machine Learning · Computer Science 2025-05-12 Kai Liu , Qian Zheng , Kaiwen Tao , Zhiteng Li , Haotong Qin , Wenbo Li , Yong Guo , Xianglong Liu , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Efficient deployment of large language models (LLMs) necessitates low-bit quantization to minimize model size and inference cost. While low-bit integer formats (e.g., INT8/INT4) have been the conventional choice, emerging low-bit…

Machine Learning · Computer Science 2023-05-23 Yijia Zhang , Lingran Zhao , Shijie Cao , Wenqiang Wang , Ting Cao , Fan Yang , Mao Yang , Shanghang Zhang , Ningyi Xu

StatQAT: Statistical Quantizer Optimization for Deep Networks

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

Text-to-image generation via Stable Diffusion models (SDM) have demonstrated remarkable capabilities. However, their computational intensity, particularly in the iterative denoising process, hinders real-time deployment in latency-sensitive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Shuaiting Li , Juncan Deng , Zeyu Wang , Kedong Xu , Rongtao Deng , Hong Gu , Haibin Shen , Kejie Huang

MPQ-Diff: Mixed Precision Quantization for Diffusion Models

Diffusion models (DMs) generate remarkable high quality images via the stochastic denoising process, which unfortunately incurs high sampling time. Post-quantizing the trained diffusion models in fixed bit-widths, e.g., 4 bits on weights…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Rocco Manz Maruzzelli , Basile Lewandowski , Lydia Y. Chen

PQD: Post-training Quantization for Efficient Diffusion Models

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Jiaojiao Ye , Zhen Wang , Linnan Jiang

Temporal Dynamic Quantization for Diffusion Models

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Junhyuk So , Jungwon Lee , Daehyun Ahn , Hyungjun Kim , Eunhyeok Park

Scaled Quantization for the Vision Transformer

Quantization using a small number of bits shows promise for reducing latency and memory usage in deep neural networks. However, most quantization methods cannot readily handle complicated functions such as exponential and square root, and…

Image and Video Processing · Electrical Eng. & Systems 2023-03-27 Yangyang Chang , Gerald E. Sobelman

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision…

Machine Learning · Computer Science 2018-06-22 Raghuraman Krishnamoorthi

MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models

Diffusion models have received wide attention in generation tasks. However, the expensive computation cost prevents the application of diffusion models in resource-constrained scenarios. Quantization emerges as a practical solution that…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Weilun Feng , Haotong Qin , Chuanguang Yang , Zhulin An , Libo Huang , Boyu Diao , Fei Wang , Renshuai Tao , Yongjun Xu , Michele Magno