Related papers: Exploring Neural Networks Quantization via Layer-W…

A simple approach for quantizing neural networks

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving…

Machine Learning · Computer Science 2023-04-06 Johannes Maly , Rayan Saab

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a…

Machine Learning · Computer Science 2019-02-07 Eldad Meller , Alexander Finkelstein , Uri Almog , Mark Grobman

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zhihang Yuan , Yiqi Chen , Chenhao Xue , Chenguang Zhang , Qiankun Wang , Guangyu Sun

Loss Aware Post-training Quantization

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or…

Machine Learning · Computer Science 2020-03-17 Yury Nahshan , Brian Chmiel , Chaim Baskin , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Quantization Networks

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

A White Paper on Neural Network Quantization

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

A separability-based approach to quantifying generalization: which layer is best?

Generalization to unseen data remains poorly understood for deep learning classification and foundation models, especially in the open set scenario. How can one assess the ability of networks to adapt to new or extended versions of their…

Machine Learning · Computer Science 2024-11-05 Luciano Dyballa , Evan Gerritz , Steven W. Zucker

QGen: On the Ability to Generalize in Quantization Aware Training

Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a…

Machine Learning · Computer Science 2024-04-22 MohammadHossein AskariHemmat , Ahmadreza Jeddi , Reyhane Askari Hemmat , Ivan Lazarevich , Alexander Hoffman , Sudhakar Sah , Ehsan Saboori , Yvon Savaria , Jean-Pierre David

Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels

We present a simple meta quantization approach that quantizes different layers of a large language model (LLM) at different bit levels, and is independent of the underlying quantization technique. Specifically, we quantize the most…

Computation and Language · Computer Science 2024-10-29 Razvan-Gabriel Dumitru , Vikas Yadav , Rishabh Maheshwary , Paul-Ioan Clotan , Sathwik Tejaswi Madhusudhan , Mihai Surdeanu

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Recurrence of Optimum for Training Weight and Activation Quantized Networks

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for…

Machine Learning · Computer Science 2021-05-25 Ziang Long , Penghang Yin , Jack Xin

Quantization-Aware Regularizers for Deep Neural Networks Compression

Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained…

Machine Learning · Computer Science 2026-02-04 Dario Malchiodi , Mattia Ferraretto , Marco Frasca

With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization

Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit…

Machine Learning · Computer Science 2022-01-31 James Wang , Cheng-Lin Yang

Iteratively Training Look-Up Tables for Network Quantization

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word…

Machine Learning · Computer Science 2023-07-19 Fabien Cardinaux , Stefan Uhlich , Kazuki Yoshiyama , Javier Alonso Garcia , Lukas Mauch , Stephen Tiedemann , Thomas Kemp , Akira Nakamura

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan

Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks

With the proliferation of deep convolutional neural network (CNN) algorithms for mobile processing, limited precision quantization has become an essential tool for CNN efficiency. Consequently, various works have sought to design fixed…

Machine Learning · Computer Science 2020-12-01 Stone Yun , Alexander Wong

Interactions Across Blocks in Post-Training Quantization of Large Language Models

Post-training quantization is widely employed to reduce the computational demands of neural networks. Typically, individual substructures, such as layers or blocks of layers, are quantized with the objective of minimizing quantization…

Machine Learning · Computer Science 2024-11-07 Khasmamad Shabanovi , Lukas Wiest , Vladimir Golkov , Daniel Cremers , Thomas Pfeil

Low-bit Model Quantization for Deep Neural Networks: A Survey

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an…

Machine Learning · Computer Science 2025-05-12 Kai Liu , Qian Zheng , Kaiwen Tao , Zhiteng Li , Haotong Qin , Wenbo Li , Yong Guo , Xianglong Liu , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang