English
Related papers

Related papers: Weight Equalizing Shift Scaler-Coupled Post-traini…

200 papers

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Despite the achievements of recent binarization methods on reducing the performance degradation of Binary Neural Networks (BNNs), gradient mismatching caused by the Straight-Through-Estimator (STE) still dominates quantized networks. This…

Computer Vision and Pattern Recognition · Computer Science 2020-09-11 Junjie Liu , Dongchao Wen , Deyu Wang , Wei Tao , Tse-Wei Chen , Kinya Osa , Masami Kato

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Although convolutional neural networks (CNNs) are now widely used in various computer vision applications, its huge resource demanding on parameter storage and computation makes the deployment on mobile and embedded devices difficult.…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Zhe Xu , Ray C. C. Cheung

Post-training quantization for reducing the storage of deep neural network models has been demonstrated to be an effective way in various tasks. However, low-bit quantization while maintaining model accuracy is a challenging problem. In…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Bingtao Yang , Yujia Wang , Mengzhi Jiao , Hongwei Huo

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Di Wu , Qi Tang , Yongle Zhao , Ming Zhang , Ying Fu , Debing Zhang

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original…

Computer Vision and Pattern Recognition · Computer Science 2022-07-05 Geon Park , Jaehong Yoon , Haiyang Zhang , Xing Zhang , Sung Ju Hwang , Yonina C. Eldar

Quantization of neural networks provides benefits of inference in less compute and memory requirements. Previous work in quantization lack two important aspects which this work provides. First almost all previous work in quantization used a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-12 Zia Badar

Binarization is an extreme network compression approach that provides large computational speedups along with energy and memory savings, albeit at significant accuracy costs. We investigate the question of where to binarize inputs at…

Computer Vision and Pattern Recognition · Computer Science 2018-04-12 Ameya Prabhu , Vishal Batchu , Rohit Gajawada , Sri Aurobindo Munagala , Anoop Namboodiri

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant…

Machine Learning · Computer Science 2020-12-15 Itay Hubara , Yury Nahshan , Yair Hanani , Ron Banner , Daniel Soudry

Quantizing large language models (LLMs) to 1-bit precision significantly reduces computational costs, but existing quantization techniques suffer from noticeable performance degradation when using weight and activation precisions below 4…

Machine Learning · Computer Science 2025-07-01 Siqing Song , Chuang Wang , Ruiqi Wang , Yi Yang , Xu-Yao Zhang

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zhihang Yuan , Yiqi Chen , Chenhao Xue , Chenguang Zhang , Qiankun Wang , Guangyu Sun

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a…

Machine Learning · Computer Science 2019-02-07 Eldad Meller , Alexander Finkelstein , Uri Almog , Mark Grobman

Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the…

Machine Learning · Computer Science 2020-06-16 Hadi Pouransari , Zhucheng Tu , Oncel Tuzel

Neural networks have shown great performance in cognitive tasks. When deploying network models on mobile devices with limited resources, weight quantization has been widely adopted. Binary quantization obtains the highest compression but…

Computer Vision and Pattern Recognition · Computer Science 2018-11-14 Hsin-Pai Cheng , Yuanjun Huang , Xuyang Guo , Yifei Huang , Feng Yan , Hai Li , Yiran Chen

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 Zhenhua Liu , Yunhe Wang , Kai Han , Siwei Ma , Wen Gao

Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e.l1-norm). To achieve higher hardware efficiency, it is…

Computer Vision and Pattern Recognition · Computer Science 2022-12-21 Ying Nie , Kai Han , Haikang Diao , Chuanjian Liu , Enhua Wu , Yunhe Wang

Foundation models have achieved remarkable results in medical image analysis. However, its large network architecture and high computational complexity significantly impact inference speed, limiting its application on terminal medical…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Yineng Chen , Peng Huang , Aozhong Zhang , Hui Guo , Penghang Yin , Shu Hu , Shao Lin , Xin Li , Tzu-Jen Kao , Balakrishnan Prabhakaran , MingChing Chang , Xin Wang
‹ Prev 1 2 3 10 Next ›