English
Related papers

Related papers: Automated Backend-Aware Post-Training Quantization

200 papers

Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic circuit characteristics such as the…

Hardware Architecture · Computer Science 2025-11-18 Rohan Juneja , Shivam Aggarwal , Safeen Huda , Tulika Mitra , Li-Shiuan Peh

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for…

Machine Learning · Computer Science 2021-11-26 Huanrui Yang , Xiaoxuan Yang , Neil Zhenqiang Gong , Yiran Chen

Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented…

Machine Learning · Computer Science 2023-09-04 Shiyang Chen , Da Zheng , Caiwen Ding , Chengying Huan , Yuede Ji , Hang Liu

Neural network quantization enables the deployment of models on edge devices. An essential requirement for their hardware efficiency is that the quantizers are hardware-friendly: uniform, symmetric, and with power-of-two thresholds. To the…

Computer Vision and Pattern Recognition · Computer Science 2021-11-17 Hai Victor Habi , Reuven Peretz , Elad Cohen , Lior Dikstein , Oranit Dror , Idit Diamant , Roy H. Jennings , Arnon Netzer

Neural Radiance Field (NeRF) has emerged as a promising 3D reconstruction method, delivering high-quality results for AR/VR applications. While quantization methods and hardware accelerators have been proposed to enhance NeRF's…

Hardware Architecture · Computer Science 2025-10-13 Yipu Zhang , Chaofang Ma , Jinming Ge , Lin Jiang , Jiang Xu , Wei Zhang

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency,…

Computer Vision and Pattern Recognition · Computer Science 2020-08-14 Kuan Wang , Zhijian Liu , Yujun Lin , Ji Lin , Song Han

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Post-training quantization offers an efficient pathway to deploy super-resolution models, yet existing methods treat weight and activation quantization independently, missing their critical interplay. Through controlled experiments on…

Image and Video Processing · Electrical Eng. & Systems 2025-11-12 Hongjun Wang , Jiyuan Chen , Xuan Song , Yinqiang Zheng

Neural network quantization is frequently used to optimize model size, latency and power consumption for on-device deployment of neural networks. In many cases, a target bit-width is set for an entire network, meaning every layer get…

Machine Learning · Computer Science 2023-02-13 Nilesh Prasad Pandey , Markus Nagel , Mart van Baalen , Yin Huang , Chirag Patel , Tijmen Blankevoort

Quantization techniques are pivotal in reducing the memory and computational demands of deep neural network inference. Existing solutions, such as ZeroQuant, offer dynamic quantization for models like BERT and GPT but overlook crucial…

Machine Learning · Computer Science 2023-10-30 Zhewei Yao , Reza Yazdani Aminabadi , Stephen Youn , Xiaoxia Wu , Elton Zheng , Yuxiong He

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Low-bit quantization emerges as one of the most promising compression approaches for deploying deep neural networks on edge devices. Mixed-precision quantization leverages a mixture of bit-widths to unleash the accuracy and efficiency…

Machine Learning · Computer Science 2024-05-24 Wei Huang , Haotong Qin , Yangdong Liu , Jingzhuo Liang , Yulun Zhang , Ying Li , Xianglong Liu

Quantized training of Large Language Models (LLMs) remains an open challenge, as maintaining accuracy while performing all matrix multiplications in low precision has proven difficult. This is particularly the case when fine-tuning…

Machine Learning · Computer Science 2025-11-06 Saleh Ashkboos , Mahdi Nikdan , Soroush Tabesh , Roberto L. Castro , Torsten Hoefler , Dan Alistarh

Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs. However, this process remains challenging due to the intractable search space of neural network architectures and hardware…

Computer Vision and Pattern Recognition · Computer Science 2021-04-27 Zhen Dong , Yizhao Gao , Qijing Huang , John Wawrzynek , Hayden K. H. So , Kurt Keutzer

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

Quantization is emerging as an efficient approach to promote hardware-friendly deep learning and run deep neural networks on resource-limited hardware. However, it still causes a significant decrease to the network in accuracy. We summarize…

Machine Learning · Computer Science 2021-12-03 Haotong Qin

Graph neural networks (GNNs) have demonstrated strong performance on a wide variety of tasks due to their ability to model non-uniform structured data. Despite their promise, there exists little research exploring methods to make them more…

Machine Learning · Computer Science 2021-03-16 Shyam A. Tailor , Javier Fernandez-Marques , Nicholas D. Lane

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous…

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi
‹ Prev 1 2 3 10 Next ›