Related papers: Effective Quantization for Diffusion Models on CPU…

Diffusion Product Quantization

In this work, we explore the quantization of diffusion models in extreme compression regimes to reduce model size while maintaining performance. We begin by investigating classical vector quantization but find that diffusion models are…

Computer Vision and Pattern Recognition · Computer Science 2024-11-20 Jie Shao , Hanxiao Zhang , Jianxin Wu

Diffusion Model Quantization: A Review

Recent success of large text-to-image models has empirically underscored the exceptional performance of diffusion models in generative tasks. To facilitate their efficient deployment on resource-constrained edge devices, model quantization…

Computer Vision and Pattern Recognition · Computer Science 2025-05-09 Qian Zeng , Chenggong Hu , Mingli Song , Jie Song

PQD: Post-training Quantization for Efficient Diffusion Models

Diffusionmodels(DMs)havedemonstratedremarkableachievements in synthesizing images of high fidelity and diversity. However, the extensive computational requirements and slow generative speed of diffusion models have limited their widespread…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Jiaojiao Ye , Zhen Wang , Linnan Jiang

Q-Diffusion: Quantizing Diffusion Models

Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Xiuyu Li , Yijiang Liu , Long Lian , Huanrui Yang , Zhen Dong , Daniel Kang , Shanghang Zhang , Kurt Keutzer

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Diffusion transformers have demonstrated remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Tianchen Zhao , Tongcheng Fang , Haofeng Huang , Enshu Liu , Rui Wan , Widyadewi Soedarmadji , Shiyao Li , Zinan Lin , Guohao Dai , Shengen Yan , Huazhong Yang , Xuefei Ning , Yu Wang

Fast DistilBERT on CPUs

Transformer-based language models have become the standard approach to solving natural language processing tasks. However, industry adoption usually requires the maximum throughput to comply with certain latency constraints that prevents…

Computation and Language · Computer Science 2022-12-08 Haihao Shen , Ofir Zafrir , Bo Dong , Hengyu Meng , Xinyu Ye , Zhe Wang , Yi Ding , Hanwen Chang , Guy Boudoukh , Moshe Wasserblat

Temporal Dynamic Quantization for Diffusion Models

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Junhyuk So , Jungwon Lee , Daehyun Ahn , Hyungjun Kim , Eunhyeok Park

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

The practical deployment of diffusion models is still hindered by the high memory and computational overhead. Although quantization paves a way for model compression and acceleration, existing methods face challenges in achieving low-bit…

Computer Vision and Pattern Recognition · Computer Science 2025-07-16 Haoxuan Wang , Yuzhang Shang , Zhihang Yuan , Junyi Wu , Junchi Yan , Yan Yan

Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models

Diffusion models are emerging models that generate images by iteratively denoising random Gaussian noise using deep neural networks. These models typically exhibit high computational and memory demands, necessitating effective post-training…

Computer Vision and Pattern Recognition · Computer Science 2024-08-14 Cheng Chen , Christina Giannoula , Andreas Moshovos

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

Diffusion models have demonstrated remarkable capabilities in image synthesis and related generative tasks. Nevertheless, their practicality for real-world applications is constrained by substantial computational costs and latency issues.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Yefei He , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Lossy Image Compression with Foundation Diffusion Models

Incorporating diffusion models in the image compression domain has the potential to produce realistic and detailed reconstructions, especially at extremely low bitrates. Previous methods focus on using diffusion models as expressive…

Image and Video Processing · Electrical Eng. & Systems 2024-10-10 Lucas Relic , Roberto Azevedo , Markus Gross , Christopher Schroers

Q-VDiT: Towards Accurate Quantization and Distillation of Video-Generation Diffusion Transformers

Diffusion transformers (DiT) have demonstrated exceptional performance in video generation. However, their large number of parameters and high computational complexity limit their deployment on edge devices. Quantization can reduce storage…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Weilun Feng , Chuanguang Yang , Haotong Qin , Xiangqi Li , Yu Wang , Zhulin An , Libo Huang , Boyu Diao , Zixiang Zhao , Yongjun Xu , Michele Magno

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the…

Machine Learning · Computer Science 2022-07-22 Daning Cheng , Wenguang Chen

Optimizing Inference in Transformer-Based Models: A Multi-Method Benchmark

Efficient inference is a critical challenge in deep generative modeling, particularly as diffusion models grow in capacity and complexity. While increased complexity often improves accuracy, it raises compute costs, latency, and memory…

Machine Learning · Computer Science 2025-09-24 Siu Hang Ho , Prasad Ganesan , Nguyen Duong , Daniel Schlabig

Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only…

Machine Learning · Computer Science 2018-05-22 Sean O. Settle , Manasa Bollavaram , Paolo D'Alberto , Elliott Delaye , Oscar Fernandez , Nicholas Fraser , Aaron Ng , Ashish Sirasao , Michael Wu

Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

Text-to-image generation via Stable Diffusion models (SDM) have demonstrated remarkable capabilities. However, their computational intensity, particularly in the iterative denoising process, hinders real-time deployment in latency-sensitive…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Shuaiting Li , Juncan Deng , Zeyu Wang , Kedong Xu , Rongtao Deng , Hong Gu , Haibin Shen , Kejie Huang

Text Embedding Knows How to Quantize Text-Guided Diffusion Models

Despite the success of diffusion models in image generation tasks such as text-to-image, the enormous computational complexity of diffusion models limits their use in resource-constrained environments. To address this, network quantization…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Hongjae Lee , Myungjun Son , Dongjea Kang , Seung-Won Jung

Timestep-Aware Correction for Quantized Diffusion Models

Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Yuzhe Yao , Feng Tian , Jun Chen , Haonan Lin , Guang Dai , Yong Liu , Jingdong Wang

Diffusion Models on the Edge: Challenges, Optimizations, and Applications

Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge devices a significant challenge. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-23 Dongqi Zheng

Accelerating Neural Network Inference by Overflow Aware Quantization

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values.…

Computer Vision and Pattern Recognition · Computer Science 2020-05-28 Hongwei Xie , Shuo Zhang , Huanghao Ding , Yafei Song , Baitao Shao , Conggang Hu , Ling Cai , Mingyang Li