English
Related papers

Related papers: Exploring Neural Networks Quantization via Layer-W…

200 papers

In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving…

Machine Learning · Computer Science 2023-04-06 Johannes Maly , Rayan Saab

Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a…

Machine Learning · Computer Science 2019-02-07 Eldad Meller , Alexander Finkelstein , Uri Almog , Mark Grobman

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Zhihang Yuan , Yiqi Chen , Chenhao Xue , Chenguang Zhang , Qiankun Wang , Guangyu Sun

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or…

Machine Learning · Computer Science 2020-03-17 Yury Nahshan , Brian Chmiel , Chaim Baskin , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization…

Machine Learning · Computer Science 2022-10-28 Ignacio Hounie , Juan Elenter , Alejandro Ribeiro

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

Generalization to unseen data remains poorly understood for deep learning classification and foundation models, especially in the open set scenario. How can one assess the ability of networks to adapt to new or extended versions of their…

Machine Learning · Computer Science 2024-11-05 Luciano Dyballa , Evan Gerritz , Steven W. Zucker

Quantization lowers memory usage, computational requirements, and latency by utilizing fewer bits to represent model weights and activations. In this work, we investigate the generalization properties of quantized neural networks, a…

We present a simple meta quantization approach that quantizes different layers of a large language model (LLM) at different bit levels, and is independent of the underlying quantization technique. Specifically, we quantize the most…

Computation and Language · Computer Science 2024-10-29 Razvan-Gabriel Dumitru , Vikas Yadav , Rishabh Maheshwary , Paul-Ioan Clotan , Sathwik Tejaswi Madhusudhan , Mihai Surdeanu

Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this…

Machine Learning · Computer Science 2018-12-18 Yuhui Xu , Yongzhuang Wang , Aojun Zhou , Weiyao Lin , Hongkai Xiong

Deep neural networks (DNNs) are quantized for efficient inference on resource-constrained platforms. However, training deep learning models with low-precision weights and activations involves a demanding optimization task, which calls for…

Machine Learning · Computer Science 2021-05-25 Ziang Long , Penghang Yin , Jack Xin

Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious challenges for deployment on resource-constrained…

Machine Learning · Computer Science 2026-02-04 Dario Malchiodi , Mattia Ferraretto , Marco Frasca

Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit…

Machine Learning · Computer Science 2022-01-31 James Wang , Cheng-Lin Yang

Operating deep neural networks (DNNs) on devices with limited resources requires the reduction of their memory as well as computational footprint. Popular reduction methods are network quantization or pruning, which either reduce the word…

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan

With the proliferation of deep convolutional neural network (CNN) algorithms for mobile processing, limited precision quantization has become an essential tool for CNN efficiency. Consequently, various works have sought to design fixed…

Machine Learning · Computer Science 2020-12-01 Stone Yun , Alexander Wong

Post-training quantization is widely employed to reduce the computational demands of neural networks. Typically, individual substructures, such as layers or blocks of layers, are quantized with the objective of minimizing quantization…

Machine Learning · Computer Science 2024-11-07 Khasmamad Shabanovi , Lukas Wiest , Vladimir Golkov , Daniel Cremers , Thomas Pfeil

With unprecedented rapid development, deep neural networks (DNNs) have deeply influenced almost all fields. However, their heavy computation costs and model sizes are usually unacceptable in real-world deployment. Model quantization, an…

Machine Learning · Computer Science 2025-05-12 Kai Liu , Qian Zheng , Kaiwen Tao , Zhiteng Li , Haotong Qin , Wenbo Li , Yong Guo , Xianglong Liu , Linghe Kong , Guihai Chen , Yulun Zhang , Xiaokang Yang
‹ Prev 1 2 3 10 Next ›