Related papers: SplitQuant: Layer Splitting for Low-Bit Neural Net…

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and these both are expensive to implement. One method to reduce the size of the DNN is to quantize the pre-trained model by using a low-bit expression for weights and…

Computer Vision and Pattern Recognition · Computer Science 2020-11-26 Jun Nishikawa , Ryoji Ikegaya

MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search

Quantization is a technique for creating efficient Deep Neural Networks (DNNs), which involves performing computations and storing tensors at lower bit-widths than f32 floating point precision. Quantization reduces model size and inference…

Machine Learning · Computer Science 2023-10-02 Eliska Kloberdanz , Wei Le

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines…

Machine Learning · Computer Science 2019-05-24 Ritchie Zhao , Yuwei Hu , Jordan Dotzel , Christopher De Sa , Zhiru Zhang

SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs

The quantization of large language models (LLMs) is crucial for deploying them on devices with limited computational resources. While advanced quantization algorithms offer improved performance compared to the basic linear quantization,…

Machine Learning · Computer Science 2025-03-12 Jaewoo Song , Fangzhen Lin

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide range of applications in computer vision. These results rely on over-parameterized backbones, which are expensive to run. This computational burden can be dramatically…

Computer Vision and Pattern Recognition · Computer Science 2023-07-03 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic…

Computer Vision and Pattern Recognition · Computer Science 2017-08-04 Yinpeng Dong , Renkun Ni , Jianguo Li , Yurong Chen , Jun Zhu , Hang Su

Subtensor Quantization for Mobilenets

Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Thu Dinh , Andrey Melnikov , Vasilios Daskalopoulos , Sek Chai

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

StatQAT: Statistical Quantizer Optimization for Deep Networks

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes,…

Machine Learning · Statistics 2026-05-19 Mehmet Aktukmak , Daniel Huang , Ke Ding

A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs) have been significant. While demonstrating high accuracy, DNNs are associated with a huge number of parameters and computations, which leads to high memory…

Machine Learning · Computer Science 2023-12-20 Babak Rokh , Ali Azarpeyvand , Alireza Khanteymoori

SplitEE: Early Exit in Deep Neural Networks with Split Computing

Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To…

Machine Learning · Computer Science 2023-09-19 Divya J. Bajpai , Vivek K. Trivedi , Sohan L. Yadav , Manjesh K. Hanawal

Class-based Quantization for Neural Networks

In deep neural networks (DNNs), there are a huge number of weights and multiply-and-accumulate (MAC) operations. Accordingly, it is challenging to apply DNNs on resource-constrained platforms, e.g., mobile phones. Quantization is a method…

Machine Learning · Computer Science 2022-11-29 Wenhao Sun , Grace Li Zhang , Huaxi Gu , Bing Li , Ulf Schlichtmann

Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them…

Computer Vision and Pattern Recognition · Computer Science 2021-02-04 Rishabh Goyal , Joaquin Vanschoren , Victor van Acht , Stephan Nijssen

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and…

Computer Vision and Pattern Recognition · Computer Science 2017-06-23 Shuchang Zhou , Yuzhi Wang , He Wen , Qinyao He , Yuheng Zou

DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs

Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address Normal Outliers,…

Computation and Language · Computer Science 2024-11-04 Haokun Lin , Haobo Xu , Yichen Wu , Jingzhi Cui , Yingtao Zhang , Linzhan Mou , Linqi Song , Zhenan Sun , Ying Wei

DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory

Quantizing the weights of a neural network has two steps: (1) Finding a good low bit-complexity representation for weights (which we call the quantization grid) and (2) Rounding the original weights to values in the quantization grid. In…

Machine Learning · Computer Science 2025-01-14 Jerry Chee , Arturs Backurs , Rainie Heck , Li Zhang , Janardhan Kulkarni , Thomas Rothvoss , Sivakanth Gopi

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the…

Machine Learning · Computer Science 2022-02-28 Amir Ardakani , Arash Ardakani , Brett Meyer , James J. Clark , Warren J. Gross

Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

With pervasive applications of medical imaging in health-care, biomedical image segmentation plays a central role in quantitative analysis, clinical diagno- sis, and medical intervention. Since manual anno- tation su ers limited…

Computer Vision and Pattern Recognition · Computer Science 2018-03-14 Xiaowei Xu , Qing Lu , Yu Hu , Lin Yang , Sharon Hu , Danny Chen , Yiyu Shi

PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions

Deep neural networks (DNNs) are ubiquitous in computer vision and natural language processing, but suffer from high inference cost. This problem can be addressed by quantization, which consists in converting floating point perations into a…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly

Towards Accurate and Efficient Sub-8-Bit Integer Training

Neural network training is a memory- and compute-intensive task. Quantization, which enables low-bitwidth formats in training, can significantly mitigate the workload. To reduce quantization error, recent methods have developed new data…

Machine Learning · Computer Science 2024-11-19 Wenjin Guo , Donglai Liu , Weiying Xie , Yunsong Li , Xuefei Ning , Zihan Meng , Shulin Zeng , Jie Lei , Zhenman Fang , Yu Wang