Related papers: Quantization Loss Re-Learning Method

Effective Quantization Methods for Recurrent Neural Networks

Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for…

Machine Learning · Computer Science 2016-12-01 Qinyao He , He Wen , Shuchang Zhou , Yuxin Wu , Cong Yao , Xinyu Zhou , Yuheng Zou

Low-Rank Quantization-Aware Training for LLMs

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and…

Machine Learning · Computer Science 2024-09-04 Yelysei Bondarenko , Riccardo Del Chiaro , Markus Nagel

Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)

The standard LSTM, although it succeeds in the modeling long-range dependences, suffers from a highly complex structure that can be simplified through modifications to its gate units. This paper was to perform an empirical comparison…

Neural and Evolutionary Computing · Computer Science 2016-12-13 Yuzhen Lu

Towards Binary-Valued Gates for Robust LSTM Training

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although…

Machine Learning · Computer Science 2018-06-11 Zhuohan Li , Di He , Fei Tian , Wei Chen , Tao Qin , Liwei Wang , Tie-Yan Liu

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

Sensitivity-Aware Post-Training Quantization for Deep Neural Networks

Model quantization reduces neural network parameter precision to achieve compression, but often compromises accuracy. Existing post-training quantization (PTQ) methods employ iterative parameter updates to preserve accuracy under high…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zekang Zheng , Haokun Li , Yaofo Chen , Mingkui Tan , Qing Du

Yet Unnoticed in LSTM: Binary Tree Based Input Reordering, Weight Regularization, and Gate Nonlinearization

LSTM models used in current Machine Learning literature and applications, has a promising solution for permitting long term information using gating mechanisms that forget and reduce effect of current input information. However, even with…

Machine Learning · Computer Science 2025-09-03 Mojtaba Moattari

RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures

Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are…

Machine Learning · Computer Science 2023-11-10 Anastasiia Prutianova , Alexey Zaytsev , Chung-Kuei Lee , Fengyu Sun , Ivan Koryakovskiy

Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition

State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications. Low-bit neural network…

Computation and Language · Computer Science 2021-12-22 Junhao Xu , Jianwei Yu , Shoukang Hu , Xunying Liu , Helen Meng

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Dawei Yang , Ning He , Xing Hu , Zhihang Yuan , Jiangyong Yu , Chen Xu , Zhe Jiang

QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling

Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal…

Machine Learning · Computer Science 2025-12-09 Isaac Kofi Nti

Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs

Large language models of high parameter counts are computationally expensive, yet can be made much more efficient by compressing their weights to very low numerical precision. This can be achieved either through post-training quantization…

Machine Learning · Computer Science 2025-04-21 Zifei Xu , Sayeh Sharify , Wanzin Yazar , Tristan Webb , Xin Wang

Learning in Gated Neural Networks

Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating…

Machine Learning · Computer Science 2020-06-19 Ashok Vardhan Makkuva , Sewoong Oh , Sreeram Kannan , Pramod Viswanath

Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?

Large language models (LLMs) require immense resources for training and inference. Quantization, a technique that reduces the precision of model parameters, offers a promising solution for improving LLM efficiency and sustainability. While…

Machine Learning · Computer Science 2025-02-18 Jacob Nielsen , Peter Schneider-Kamp , Lukas Galke

FineGates: LLMs Finetuning with Compression using Stochastic Gates

Large Language Models (LLMs), with billions of parameters, present significant challenges for full finetuning due to the high computational demands, memory requirements, and impracticality of many real-world applications. When faced with…

Machine Learning · Computer Science 2024-12-18 Jonathan Svirsky , Yehonathan Refael , Ofir Lindenbaum

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models

Improving the efficiency of inference in Large Language Models (LLMs) is a critical area of research. Post-training Quantization (PTQ) is a popular technique, but it often faces challenges at low-bit levels, particularly in downstream…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Wenjin Ke , Zhe Li , Dong Li , Lu Tian , Emad Barsoum

Attention Round for Post-Training Quantization

At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the…

Machine Learning · Computer Science 2022-07-08 Huabin Diao , Gongyan Li , Shaoyun Xu , Yuexing Hao

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs, making deployment difficult in resource-limited settings. Quantization techniques,…

Computation and Language · Computer Science 2024-06-07 Renren Jin , Jiangcun Du , Wuwei Huang , Wei Liu , Jian Luan , Bin Wang , Deyi Xiong

Loss Aware Post-training Quantization

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or…

Machine Learning · Computer Science 2020-03-17 Yury Nahshan , Brian Chmiel , Chaim Baskin , Evgenii Zheltonozhskii , Ron Banner , Alex M. Bronstein , Avi Mendelson

Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners

The quantization of large language models (LLMs) has been a prominent research area aimed at enabling their lightweight deployment in practice. Existing research about LLM's quantization has mainly explored the interplay between weights and…

Computation and Language · Computer Science 2025-05-16 Yifei Gao , Jie Ou , Lei Wang , Jun Cheng , Mengchu Zhou