Related papers: Generative Low-bitwidth Data Free Quantization
In the realm of deep neural network deployment, low-bit quantization presents a promising avenue for enhancing computational efficiency. However, it often hinges on the availability of training data to mitigate quantization errors, a…
Quantization approximates a deep network model with floating-point numbers by the one with low bit width numbers, in order to accelerate inference and reduce computation. Quantizing a model without access to the original data, zero-shot…
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the original data, but generates the fake sample via a generator (G) by learning from full-precision network (P), which, however, is totally independent…
To obtain lower inference latency and less memory footprint of deep neural networks, model quantization has been widely employed in deep model deployment, by converting the floating points to low-precision integers. However, previous…
Generative data-free quantization emerges as a practical compression approach that quantizes deep neural networks to low bit-width without accessing the real data. This approach generates data utilizing batch normalization (BN) statistics…
We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit…
Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on…
Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photo-realistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of…
Data-free quantization (DFQ) is a technique that creates a lightweight network from its full-precision counterpart without the original training data, often through a synthetic dataset. Although several DFQ methods have been proposed for…
Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit…
Quantization is a widely adopted technique for deep neural networks to reduce the memory and computational resources required. However, when quantized, most models would need a suitable calibration process to keep their performance intact,…
Network quantization is an essential procedure in deep learning for development of efficient fixed-point inference models on mobile or edge platforms. However, as datasets grow larger and privacy regulations become stricter, data sharing…
Network quantization is an effective method for the deployment of neural networks on memory and energy constrained mobile devices. In this paper, we propose a Dynamic Network Quantization (DNQ) framework which is composed of two modules: a…
While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized…
Post-training quantization (PTQ) improves efficiency but struggles with limited calibration data, especially under privacy constraints. Data-free quantization (DFQ) mitigates this by generating synthetic images using generative models such…
Neural network quantization is becoming an industry standard to efficiently deploy deep learning models on hardware platforms, such as CPU, GPU, TPU, and FPGAs. However, we observe that the conventional quantization approaches are…
Existing quantization aware training methods attempt to compensate for the quantization loss by leveraging on training data, like most of the post-training quantization methods, and are also time consuming. Both these methods are not…
Data-Free Quantization (DFQ) enables the quantization of Vision Transformers (ViTs) without requiring access to data, allowing for the deployment of ViTs on devices with limited resources. In DFQ, the quantization model must be calibrated…
Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…
Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential…