English
Related papers

Related papers: QwT-v2: Practical, Effective and Efficient Post-Tr…

200 papers

Deep neural networks, while achieving remarkable success across diverse tasks, demand significant resources, including computation, GPU memory, bandwidth, storage, and energy. Network quantization, as a standard compression and acceleration…

Computer Vision and Pattern Recognition · Computer Science 2025-07-09 Minghao Fu , Hao Yu , Jie Shao , Junjie Zhou , Ke Zhu , Jianxin Wu

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge…

Machine Learning · Computer Science 2021-06-16 Markus Nagel , Marios Fournarakis , Rana Ali Amjad , Yelysei Bondarenko , Mart van Baalen , Tijmen Blankevoort

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task…

Machine Learning · Computer Science 2024-12-23 Chengting Yu , Shu Yang , Fengzhao Zhang , Hanzhi Ma , Aili Wang , Er-Ping Li

The post-training quantization (PTQ) challenge of bringing quantized neural net accuracy close to original has drawn much attention driven by industry demand. Many of the methods emphasize optimization of a specific degree-of-freedom (DoF),…

Machine Learning · Statistics 2023-03-21 Alex Finkelstein , Ella Fuchs , Idan Tal , Mark Grobman , Niv Vosco , Eldad Meller

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Foundation models have achieved remarkable results in medical image analysis. However, its large network architecture and high computational complexity significantly impact inference speed, limiting its application on terminal medical…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Yineng Chen , Peng Huang , Aozhong Zhang , Hui Guo , Penghang Yin , Shu Hu , Shao Lin , Xin Li , Tzu-Jen Kao , Balakrishnan Prabhakaran , MingChing Chang , Xin Wang

Deep neural networks have been proven effective in a wide range of tasks. However, their high computational and memory costs make them impractical to deploy on resource-constrained devices. To address this issue, quantization schemes have…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Jie Hu , Mengze Zeng , Enhua Wu

Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake…

Computer Vision and Pattern Recognition · Computer Science 2025-04-25 Wenqiang Zhou , Zhendong Yu , Xinyu Liu , Jiaming Yang , Rong Xiao , Tao Wang , Chenwei Tang , Jiancheng Lv

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed mainly on Convolutional Neural Networks (CNNs), and…

Computer Vision and Pattern Recognition · Computer Science 2023-02-20 Yang Lin , Tianyu Zhang , Peiqin Sun , Zheng Li , Shuchang Zhou

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources. Neural network quantization has significant benefits in reducing the amount of…

Computer Vision and Pattern Recognition · Computer Science 2019-05-30 Ron Banner , Yury Nahshan , Elad Hoffer , Daniel Soudry

Deep neural networks have achieved state-of-the-art results in a wide range of applications, from natural language processing and computer vision to speech recognition. However, as tasks become increasingly complex, model sizes continue to…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Tomer Gafni , Asaf Karnieli , Yair Hanani

Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Dawei Yang , Ning He , Xing Hu , Zhihang Yuan , Jiangyong Yu , Chen Xu , Zhe Jiang

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi

Although deep neural networks are highly effective, their high computational and memory costs severely challenge their applications on portable devices. As a consequence, low-bit quantization, which converts a full-precision neural network…

Computer Vision and Pattern Recognition · Computer Science 2019-12-02 Jiwei Yang , Xu Shen , Jun Xing , Xinmei Tian , Houqiang Li , Bing Deng , Jianqiang Huang , Xiansheng Hua

Deploying Deep Neural Networks in low-power embedded devices for real time-constrained applications requires optimization of memory and computational complexity of the networks, usually by quantizing the weights. Most of the existing works…

Machine Learning · Computer Science 2022-03-11 Dominika Przewlocka-Rus , Syed Shakib Sarwar , H. Ekin Sumbul , Yuecheng Li , Barbara De Salvo

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized…

Machine Learning · Computer Science 2023-01-18 Jinjie Zhang , Yixuan Zhou , Rayan Saab

Reasoning models excel at complex tasks such as coding and mathematics, yet their inference is often slow and token-inefficient. To improve the inference efficiency, post-training quantization (PTQ) usually comes with the cost of large…

Machine Learning · Computer Science 2026-01-22 Keyu Lv , Manyi Zhang , Xiaobo Xia , Jingchen Ni , Shannan Yan , Xianzhi Yu , Lu Hou , Chun Yuan , Haoli Bai

Multi-bit quantization networks enable flexible deployment of deep neural networks by supporting multiple precision levels within a single model. However, existing approaches suffer from significant training overhead as full-dataset updates…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Jinhee Kim , Jae Jun An , Kang Eun Jeon , Jong Hwan Ko

Despite the success of CNN models on a variety of Image classification and segmentation tasks, their extensive computational and storage demands pose considerable challenges for real-world deployment on resource-constrained devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-10 Ahmed Luqman , Khuzemah Qazi , Murray Patterson , Malik Jahan Khan , Imdadullah Khan

Deep Neural Networks (DNNs) have achieved significant advances in a wide range of applications. However, their deployment on resource-constrained devices remains a challenge due to the large number of layers and parameters, which result in…

Neural and Evolutionary Computing · Computer Science 2025-09-05 Sara Makenali , Babak Rokh , Ali Azarpeyvand
‹ Prev 1 2 3 10 Next ›