Related papers: GDRQ: Group-based Distribution Reshaping for Quant…

DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models

Despite the widespread use of text-to-image diffusion models across various tasks, their computational and memory demands limit practical applications. To mitigate this issue, quantization of diffusion models has been explored. It reduces…

Computer Vision and Pattern Recognition · Computer Science 2025-02-13 Hyogon Ryu , NaHyeon Park , Hyunjung Shim

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Runpei Dong , Zhanhong Tan , Mengdi Wu , Linfeng Zhang , Kaisheng Ma

GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

Quantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank correction methods…

Machine Learning · Computer Science 2026-05-01 Selim An , Il hong Suh , Yeseong Kim

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…

Computer Vision and Pattern Recognition · Computer Science 2019-08-15 Ruihao Gong , Xianglong Liu , Shenghu Jiang , Tianxiang Li , Peng Hu , Jiazhen Lin , Fengwei Yu , Junjie Yan

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision…

Machine Learning · Computer Science 2021-09-07 Jung Hyun Lee , Jihun Yun , Sung Ju Hwang , Eunho Yang

SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or…

Computer Vision and Pattern Recognition · Computer Science 2018-07-03 Julian Faraone , Nicholas Fraser , Michaela Blott , Philip H. W. Leong

Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-23 Shaohang Jia , Zhiyong Huang , Zhi Yu , Mingyang Hou , Shuai Miao , Han Yang

Cluster Regularized Quantization for Deep Networks Compression

Deep neural networks (DNNs) have achieved great success in a wide range of computer vision areas, but the applications to mobile devices is limited due to their high storage and computational cost. Much efforts have been devoted to compress…

Computer Vision and Pattern Recognition · Computer Science 2019-05-14 Yiming Hu , Jianquan Li , Xianlei Long , Shenhua Hu , Jiagang Zhu , Xingang Wang , Qingyi Gu

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Dongqing Zhang , Jiaolong Yang , Dongqiangzi Ye , Gang Hua

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks

Quantizing deep convolutional neural networks for image super-resolution substantially reduces their computational costs. However, existing works either suffer from a severe performance drop in ultra-low precision of 4 or lower bit-widths,…

Computer Vision and Pattern Recognition · Computer Science 2022-07-08 Cheeun Hong , Heewon Kim , Sungyong Baik , Junghun Oh , Kyoung Mu Lee

Semi-Relaxed Quantization with DropBits: Training Low-Bit Neural Networks via Bit-wise Regularization

Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged as one of the key ingredients to reduce the size of neural networks for their deployments to resource-limited devices. In order…

Computer Vision and Pattern Recognition · Computer Science 2021-09-08 Jung Hyun Lee , Jihun Yun , Sung Ju Hwang , Eunho Yang

Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, their practical deployment is limited by slow…

Machine Learning · Computer Science 2026-03-03 Dung Anh Hoang , Cuong Pham anh Trung Le , Jianfei Cai , Thanh-Toan Do

Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues

Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Existing methods explore layer-to-bit…

Image and Video Processing · Electrical Eng. & Systems 2024-12-24 Mingshen Wang , Zhao Zhang , Feng Li , Ke Xu , Kang Miao , Meng Wang

StatQAT: Statistical Quantizer Optimization for Deep Networks

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

Quantization has become a standard tool for efficient LLM deployment, especially for local inference, where models are now routinely served at 2-3 bits per parameter. The state of the art is currently split into simple scalar quantization…

Computation and Language · Computer Science 2026-05-18 Alireza Dadgarnia , Soroush Tabesh , Mahdi Nikdan , Michael Helcig , Eldar Kurtic , Maximilian Kleinegger , Dan Alistarh

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing…

Machine Learning · Computer Science 2026-01-27 Xi Zhang , Xiaolin Wu , Jiamang Wang , Weisi Lin

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Diffusion models have achieved remarkable success in image generation but come with significant computational costs, posing challenges for deployment in resource-constrained environments. Recent post-training quantization (PTQ) methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Dongyeun Lee , Jiwan Hur , Hyounguk Shon , Jae Young Lee , Junmo Kim

Scaling Image Tokenizers with Grouped Spherical Quantization

Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Jiangtao Wang , Zhen Qin , Yifan Zhang , Vincent Tao Hu , Björn Ommer , Rania Briq , Stefan Kesselheim

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua