Related papers: Long-Range Zero-Shot Generative Deep Network Quant…

Zero-shot Adversarial Quantization

Model quantization is a promising approach to compress deep neural networks and accelerate inference, making it possible to be deployed on mobile and edge devices. To retain the high performance of full-precision models, most existing…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Yuang Liu , Wei Zhang , Jun Wang

Sharpness-Aware Data Generation for Zero-shot Quantization

Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing…

Machine Learning · Computer Science 2025-10-09 Dung Hoang-Anh , Cuong Pham Trung Le , Jianfei Cai , Thanh-Toan Do

Zero-shot Quantization: A Comprehensive Survey

Network quantization has proven to be a powerful approach to reduce the memory and computational demands of deep learning models for deployment on resource-constrained devices. However, traditional quantization methods often rely on access…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Minjun Kim , Jaehyeon Choi , Jongkeun Lee , Wonjin Cho , U Kang

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy…

Computation and Language · Computer Science 2023-10-23 Miaoxi Zhu , Qihuang Zhong , Li Shen , Liang Ding , Juhua Liu , Bo Du , Dacheng Tao

Generative Low-bitwidth Data Free Quantization

Neural network quantization is an effective way to compress deep models and improve their execution latency and energy efficiency, so that they can be deployed on mobile or embedded devices. Existing quantization methods require original…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Shoukai Xu , Haokun Li , Bohan Zhuang , Jing Liu , Jiezhang Cao , Chuangrun Liang , Mingkui Tan

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

In the realm of deep neural network deployment, low-bit quantization presents a promising avenue for enhancing computational efficiency. However, it often hinges on the availability of training data to mitigate quantization errors, a…

Computer Vision and Pattern Recognition · Computer Science 2024-09-18 Yuhang Li , Youngeun Kim , Donghyun Lee , Souvik Kundu , Priyadarshini Panda

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

How can we accurately quantize a pre-trained model without any data? Quantization algorithms are widely used for deploying neural networks on resource-constrained edge devices. Zero-shot Quantization (ZSQ) addresses the crucial and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Minjun Kim , Jongjin Kim , U Kang

ZeroQ: A Novel Zero Shot Quantization Framework

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to the original training dataset for retraining during quantization.…

Computer Vision and Pattern Recognition · Computer Science 2020-03-29 Yaohui Cai , Zhewei Yao , Zhen Dong , Amir Gholami , Michael W. Mahoney , Kurt Keutzer

Hard Sample Matters a Lot in Zero-Shot Quantization

Zero-shot quantization (ZSQ) is promising for compressing and accelerating deep neural networks when the data for training full-precision models are inaccessible. In ZSQ, network quantization is performed using synthetic samples, thus, the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-27 Huantong Li , Xiangmiao Wu , Fanbing Lv , Daihai Liao , Thomas H. Li , Yonggang Zhang , Bo Han , Mingkui Tan

Genie: Show Me the Data for Quantization

Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($\mu$ and…

Machine Learning · Computer Science 2023-08-09 Yongkweon Jeon , Chungman Lee , Ho-young Kim

Data Generation for Hardware-Friendly Post-Training Quantization

Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints. However, existing data generation methods often struggle to effectively generate data suitable…

Machine Learning · Computer Science 2025-02-06 Lior Dikstein , Ariel Lapid , Arnon Netzer , Hai Victor Habi

A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights Statistics

Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Prasen Kumar Sharma , Arun Abraham , Vikram Nelvoy Rajendiran

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

Learning to synthesize data has emerged as a promising direction in zero-shot quantization (ZSQ), which represents neural networks by low-bit integer without accessing any of the real data. In this paper, we observe an interesting…

Computer Vision and Pattern Recognition · Computer Science 2022-03-11 Yunshan Zhong , Mingbao Lin , Gongrui Nan , Jianzhuang Liu , Baochang Zhang , Yonghong Tian , Rongrong Ji

Towards Feature Distribution Alignment and Diversity Enhancement for Data-Free Quantization

To obtain lower inference latency and less memory footprint of deep neural networks, model quantization has been widely employed in deep model deployment, by converting the floating points to low-precision integers. However, previous…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Yangcheng Gao , Zhao Zhang , Richang Hong , Haijun Zhang , Jicong Fan , Shuicheng Yan

Task-Specific Zero-shot Quantization-Aware Training for Object Detection

Quantization is a key technique to reduce network size and computational complexity by representing the network parameters with a lower precision. Traditional quantization methods rely on access to original training data, which is often…

Computer Vision and Pattern Recognition · Computer Science 2025-07-23 Changhao Li , Xinrui Chen , Ji Wang , Kang Zhao , Jianfei Chen

GranQ: Efficient Channel-wise Quantization via Vectorized Pre-Scaling for Zero-Shot QAT

Zero-shot quantization (ZSQ) enables neural network compression without original training data, making it a promising solution for restricted data access scenarios. To compensate for the lack of data, recent ZSQ methods typically rely on…

Computer Vision and Pattern Recognition · Computer Science 2025-10-16 Inpyo Hong , Youngwan Jo , Hyojeong Lee , Sunghyun Ahn , Kijung Lee , Sanghyun Park

Generative Zero-shot Network Quantization

Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration. We show that, for high-level image recognition tasks, we can further reconstruct…

Computer Vision and Pattern Recognition · Computer Science 2021-01-22 Xiangyu He , Qinghao Hu , Peisong Wang , Jian Cheng

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…

Computer Vision and Pattern Recognition · Computer Science 2021-11-03 Kohei Yamamoto

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Large language models (LLMs) have shown immense potential across various domains, but their high memory requirements and inference costs remain critical challenges for deployment. Post-training quantization (PTQ) has emerged as a promising…

Machine Learning · Computer Science 2026-01-05 Tianyi Zhang , Anshumali Shrivastava

DNN Quantization with Attention

Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop…

Computer Vision and Pattern Recognition · Computer Science 2021-03-25 Ghouthi Boukli Hacene , Lukas Mauch , Stefan Uhlich , Fabien Cardinaux