Related papers: Deep Learning-Based Quantization of L-Values for G…

Deep Log-Likelihood Ratio Quantization

In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is…

Machine Learning · Computer Science 2021-05-11 Marius Arvinte , Ahmed H. Tewfik , Sriram Vishwanath

Learning Content-Weighted Deep Image Compression

Learning-based lossy image compression usually involves the joint optimization of rate-distortion performance. Most existing methods adopt spatially invariant bit length allocation and incorporate discrete entropy approximation to constrain…

Computer Vision and Pattern Recognition · Computer Science 2019-04-02 Mu Li , Wangmeng Zuo , Shuhang Gu , Jane You , David Zhang

Quantization Design for Deep Learning-Based CSI Feedback

Deep learning-based autoencoders have been employed to compress and reconstruct channel state information (CSI) in frequency-division duplex systems. Practical implementations require judicious quantization of encoder outputs for digital…

Signal Processing · Electrical Eng. & Systems 2025-03-12 Manru Yin , Shengqian Han , Chenyang Yang

Low-Rank Quantization-Aware Training for LLMs

Large language models (LLMs) are omnipresent, however their practical deployment is challenging due to their ever increasing computational and memory demands. Quantization is one of the most effective ways to make them more compute and…

Machine Learning · Computer Science 2024-09-04 Yelysei Bondarenko , Riccardo Del Chiaro , Markus Nagel

Residual vector quantization for KV cache compression in large language model

KV cache compression methods have mainly relied on scalar quantization techniques to reduce the memory requirements during decoding. In this work, we apply residual vector quantization, which has been widely used for high fidelity audio…

Machine Learning · Computer Science 2024-10-22 Ankur Kumar

1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width…

Machine Learning · Computer Science 2026-02-18 Sohir Maskey , Constantin Eichenberg , Johannes Messner , Douglas Orr

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work,…

Machine Learning · Computer Science 2024-03-15 Cheng Zhang , Jianyi Cheng , Ilia Shumailov , George A. Constantinides , Yiren Zhao

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices. Inspired by existing methods, we propose a new framework to learn the…

Machine Learning · Computer Science 2022-02-28 Amir Ardakani , Arash Ardakani , Brett Meyer , James J. Clark , Warren J. Gross

QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning

Fine-tuning large language models (LLMs) is often constrained by the computational costs of processing massive datasets. We propose \textbf{QLESS} (Quantized Low-rank Gradient Similarity Search), which integrates gradient quantization with…

Machine Learning · Computer Science 2025-02-05 Moses Ananta , Muhammad Farid Adilazuarda , Zayd Muhammad Kawakibi Zuhri , Ayu Purwarianti , Alham Fikri Aji

Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such…

Machine Learning · Computer Science 2017-11-15 Brandon Reagen , Udit Gupta , Robert Adolf , Michael M. Mitzenmacher , Alexander M. Rush , Gu-Yeon Wei , David Brooks

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to reduce hardware resource requirements significantly while maintaining accuracy of DNNs. Model compression is especially useful for memory-intensive recurrent neural…

Machine Learning · Computer Science 2018-05-30 Dongsoo Lee , Byeongwook Kim

WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization

Quantization is an effective approach to reduce the memory footprint and inference cost of large language models (LLMs), yet maintaining performance in the ultra-low-bit regime remains challenging. Existing post-training methods often…

Machine Learning · Computer Science 2026-05-27 Phong Nam Huu Nguyen , Khoi M. Le , Cong-Duy T Nguyen , Anh Tuan Luu , Thong Thanh Nguyen , Tho Quan

Quantization Loss Re-Learning Method

In order to quantize the gate parameters of the LSTM (Long Short-Term Memory) neural network model with almost no recognition performance degraded, a new quantization method named Quantization Loss Re-Learn Method is proposed in this paper.…

Machine Learning · Computer Science 2019-06-03 Kunping Li

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

Fine-tuning is a crucial process for adapting large language models (LLMs) to diverse applications. In certain scenarios, such as multi-tenant serving, deploying multiple LLMs becomes necessary to meet complex demands. Recent studies…

Computation and Language · Computer Science 2024-11-27 Bowen Ping , Shuo Wang , Hanqing Wang , Xu Han , Yuzhuang Xu , Yukun Yan , Yun Chen , Baobao Chang , Zhiyuan Liu , Maosong Sun

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Large language models (LLMs) exhibit excellent performance in various tasks. However, the memory requirements of LLMs present a great challenge when deploying on memory-limited devices, even for quantized LLMs. This paper introduces a…

Computation and Language · Computer Science 2025-02-24 Weilan Wang , Yu Mao , Dongdong Tang , Hongchao Du , Nan Guan , Chun Jason Xue

End-to-End Learned Image Compression with Quantized Weights and Activations

End-to-end Learned image compression (LIC) has reached the traditional hand-crafted methods such as BPG (HEVC intra) in terms of the coding gain. However, the large network size prohibits the usage of LIC on resource-limited embedded…

Image and Video Processing · Electrical Eng. & Systems 2021-11-19 Heming Sun , Lu Yu , Jiro Katto

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Model quantization enables efficient deployment of deep neural networks on edge devices through low-bit parameter representation, yet raises critical challenges for implementing machine unlearning (MU) under data privacy regulations.…

Machine Learning · Computer Science 2025-03-19 Yujia Tong , Yuze Wang , Jingling Yuan , Chuang Hu

Large-Scale Learning with Less RAM via Randomization

We reduce the memory footprint of popular large-scale online learning methods by projecting our weight vector onto a coarse discrete set using randomized rounding. Compared to standard 32-bit float encodings, this reduces RAM usage by more…

Machine Learning · Computer Science 2013-03-20 Daniel Golovin , D. Sculley , H. Brendan McMahan , Michael Young

"Machine LLRning": Learning to Softly Demodulate

Soft demodulation, or demapping, of received symbols back into their conveyed soft bits, or bit log-likelihood ratios (LLRs), is at the very heart of any modern receiver. In this paper, a trainable universal neural network-based demodulator…

Information Theory · Computer Science 2020-03-23 Ori Shental , Jakob Hoydis

Radio: Rate-Distortion Optimization for Large Language Model Compression

In recent years, the compression of large language models (LLMs) has emerged as a key problem in facilitating LLM deployment on resource-limited devices, reducing compute costs, and mitigating the environmental footprint due to large-scale…

Machine Learning · Computer Science 2025-05-07 Sean I. Young