Related papers: Softmax Bias Correction for Quantized Generative M…

Q-Diffusion: Quantizing Diffusion Models

Diffusion models have achieved great success in image synthesis through iterative noise estimation using deep neural networks. However, the slow inference, high memory consumption, and computation intensity of the noise estimation model…

Computer Vision and Pattern Recognition · Computer Science 2023-06-09 Xiuyu Li , Yijiang Liu , Long Lian , Huanrui Yang , Zhen Dong , Daniel Kang , Shanghang Zhang , Kurt Keutzer

PTQD: Accurate Post-Training Quantization for Diffusion Models

Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-11-02 Yefei He , Luping Liu , Jing Liu , Weijia Wu , Hong Zhou , Bohan Zhuang

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Jiawei Liu , Lin Niu , Zhihang Yuan , Dawei Yang , Xinggang Wang , Wenyu Liu

Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, their practical deployment is limited by slow…

Machine Learning · Computer Science 2026-03-03 Dung Anh Hoang , Cuong Pham anh Trung Le , Jianfei Cai , Thanh-Toan Do

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods…

Machine Learning · Computer Science 2023-03-24 Zhihang Yuan , Jiawei Liu , Jiaxiang Wu , Dawei Yang , Qiang Wu , Guangyu Sun , Wenyu Liu , Xinggang Wang , Bingzhe Wu

Post-training Quantization on Diffusion Models

Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a…

Computer Vision and Pattern Recognition · Computer Science 2023-03-17 Yuzhang Shang , Zhihang Yuan , Bin Xie , Bingzhe Wu , Yan Yan

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Vage Egiazarian , Denis Kuznedelev , Anton Voronov , Ruslan Svirschevski , Michael Goin , Daniil Pavlov , Dan Alistarh , Dmitry Baranchuk

Timestep-Aware Correction for Quantized Diffusion Models

Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Yuzhe Yao , Feng Tian , Jun Chen , Haonan Lin , Guang Dai , Yong Liu , Jingdong Wang

Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but their massive sizes hinder deployment on resource-constrained devices. To reduce their computational and memory burden, various compression…

Machine Learning · Computer Science 2026-05-18 Dung Anh Hoang , Cuong Pham , Cuong Nguyen , Trung le , Jianfei Cai , Thanh-Toan Do

Efficient Quantization Strategies for Latent Diffusion Models

Latent Diffusion Models (LDMs) capture the dynamic evolution of latent variables over time, blending patterns and multimodality in a generative system. Despite the proficiency of LDM in various applications, such as text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Yuewei Yang , Xiaoliang Dai , Jialiang Wang , Peizhao Zhang , Hongbo Zhang

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

Large Language Models (LLMs) suffer severe performance degradation when facing extremely low-bit (sub 2-bit) quantization. Several existing sub 2-bit post-training quantization (PTQ) methods utilize a mix-precision scheme by leveraging an…

Machine Learning · Computer Science 2025-08-07 Jiaqi Zhao , Miao Zhang , Ming Wang , Yuzhang Shang , Kaihao Zhang , Weili Guan , Yaowei Wang , Min Zhang

Post-Training Quantization for Audio Diffusion Transformers

Diffusion Transformers (DiTs) enable high-quality audio synthesis but are often computationally intensive and require substantial storage, which limits their practical deployment. In this paper, we present a comprehensive evaluation of…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-02 Tanmay Khandelwal , Magdalena Fuentes

BAQ: Efficient Bit Allocation Quantization for Large Language Models

Post-training model quantization is a widely adopted technique for reducing the memory and computational costs of large language models (LLMs). However, most existing methods rely on uniform or heuristic bitwidth assignments, failing to…

Machine Learning · Computer Science 2025-06-09 Chao Zhang , Li Wang , Samson Lasaulce , Merouane Debbah

Assessing the Potential for Catastrophic Failure in Dynamic Post-Training Quantization

Post-training quantization (PTQ) has recently emerged as an effective tool for reducing the computational complexity and memory usage of a neural network by representing its weights and activations with lower precision. While this paradigm…

Machine Learning · Computer Science 2025-10-06 Logan Frank , Paul Ardis

A Comprehensive Evaluation on Quantization Techniques for Large Language Models

For large language models (LLMs), post-training quantization (PTQ) can significantly reduce memory footprint and computational overhead. Model quantization is rapidly evolving. Though many papers report breakthrough results, they are often…

Machine Learning · Computer Science 2026-01-30 Yutong Liu , Cairong Zhao , Guosheng Hu

Optimizing Large Language Models through Quantization: A Comparative Analysis of PTQ and QAT Techniques

This paper presents a comprehensive analysis of quantization techniques for optimizing Large Language Models (LLMs), specifically focusing on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). Through empirical…

Machine Learning · Computer Science 2024-11-12 Jahid Hasan

StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models

In this paper, we propose StableQuant, a novel adaptive post-training quantization (PTQ) algorithm for widely used speech foundation models (SFMs). While PTQ has been successfully employed for compressing large language models (LLMs) due to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-04-22 Yeona Hong , Hyewon Han , Woo-jin Chung , Hong-Goo Kang

Enhancing Post-Training Quantization via Future Activation Awareness

Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) based on current-layer activations. Although this…

Machine Learning · Computer Science 2026-02-04 Zheqi Lv , Zhenxuan Fan , Qi Tian , Wenqiao Zhang , Yueting Zhuang

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving…

Machine Learning · Computer Science 2023-12-14 Liang Li , Qingyuan Li , Bo Zhang , Xiangxiang Chu