Related papers: Float8@2bits: Entropy Coding Enables Data-Free Mod…

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements. In this work, we present an…

Computation and Language · Computer Science 2022-06-07 Zhewei Yao , Reza Yazdani Aminabadi , Minjia Zhang , Xiaoxia Wu , Conglong Li , Yuxiong He

MEC-Quant: Maximum Entropy Coding for Extremely Low Bit Quantization-Aware Training

Quantization-Aware Training (QAT) has driven much attention to produce efficient neural networks. Current QAT still obtains inferior performances compared with the Full Precision (FP) counterpart. In this work, we argue that quantization…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Junbiao Pang , Tianyang Cai , Baochang Zhang

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding

The ever-growing size of neural networks poses serious challenges on resource-constrained devices, such as embedded sensors. Compression algorithms that reduce their size can mitigate these problems, provided that model performance stays…

Machine Learning · Computer Science 2025-05-27 Alexander Conzelmann , Robert Bamler

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

Large transformer models have demonstrated remarkable success. Post-training quantization (PTQ), which requires only a small dataset for calibration and avoids end-to-end retraining, is a promising solution for compressing these large…

Machine Learning · Computer Science 2024-02-09 Zhikai Li , Xuewen Liu , Jing Zhang , Qingyi Gu

Conditional Entropy Coding for Efficient Video Compression

We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit…

Image and Video Processing · Electrical Eng. & Systems 2020-08-24 Jerry Liu , Shenlong Wang , Wei-Chiu Ma , Meet Shah , Rui Hu , Pranaab Dhawan , Raquel Urtasun

EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices

Large Language Models (LLMs) achieve strong performance across tasks, but face storage and compute challenges on edge devices. We propose EntroLLM, a compression framework combining mixed quantization and entropy coding to reduce storage…

Machine Learning · Computer Science 2026-05-05 Arnab Sanyal , Gourav Datta , Prithwish Mukherjee , Sandeep P. Chinchali , Michael Orshansky

EfQAT: An Efficient Framework for Quantization-Aware Training

Quantization-aware training (QAT) schemes have been shown to achieve near-full precision accuracy. They accomplish this by training a quantized model for multiple epochs. This is computationally expensive, mainly because of the full…

Machine Learning · Computer Science 2024-11-19 Saleh Ashkboos , Bram Verhoef , Torsten Hoefler , Evangelos Eleftheriou , Martino Dazzi

Efficient Learned Image Compression without Entropy Coding

Entropy coding is widely used in typical learned image compression (LIC) that converts latents into a compact bitstream. However, entropy coding is typically sequential and becomes the coding latency bottleneck. To overcome it, we present…

Image and Video Processing · Electrical Eng. & Systems 2026-05-25 Hao Cao , Wenqi Guo , Zhijin Qin , Jungong Han

EfficientQuant: An Efficient Post-Training Quantization for CNN-Transformer Hybrid Models on Edge Devices

Hybrid models that combine convolutional and transformer blocks offer strong performance in computer vision (CV) tasks but are resource-intensive for edge deployment. Although post-training quantization (PTQ) can help reduce resource…

Computer Vision and Pattern Recognition · Computer Science 2025-06-16 Shaibal Saha , Lanyu Xu

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points

Quantization emerges as one of the most promising compression technologies for deploying efficient large models for various real time application in recent years. Considering that the storage and IO of weights take up the vast majority of…

Machine Learning · Computer Science 2024-04-22 Yi Guo , Fanliu Kong , Xiaoyang Li , Hui Li , Wei Chen , Xiaogang Tian , Jinping Cai , Yang Zhang , Shouda Liu

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network inference in various deep learning applications. There are two kinds of quantization methods, training-based quantization and post-training quantization. Training-based…

Computer Vision and Pattern Recognition · Computer Science 2020-07-01 Di Wu , Qi Tang , Yongle Zhao , Ming Zhang , Ying Fu , Debing Zhang

NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models

Weight-only quantization has become a standard approach for efficiently serving large language models (LLMs). However, existing methods fail to efficiently compress models to binary (1-bit) levels, as they either require large amounts of…

Machine Learning · Computer Science 2026-05-19 Hyochan Chong , Dongkyu Kim , Changdong Kim , Minseop Choi

Wideband and Entropy-Aware Deep Soft Bit Quantization

Deep learning has been recently applied to physical layer processing in digital communication systems in order to improve end-to-end performance. In this work, we introduce a novel deep learning solution for soft bit quantization across…

Signal Processing · Electrical Eng. & Systems 2021-10-20 Marius Arvinte , Jonathan I. Tamir

Reducing The Amortization Gap of Entropy Bottleneck In End-to-End Image Compression

End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network,…

Image and Video Processing · Electrical Eng. & Systems 2022-09-05 Muhammet Balcilar , Bharath Damodaran , Pierre Hellier

Optimized learned entropy coding parameters for practical neural-based image and video compression

Neural-based image and video codecs are significantly more power-efficient when weights and activations are quantized to low-precision integers. While there are general-purpose techniques for reducing quantization effects, large losses can…

Image and Video Processing · Electrical Eng. & Systems 2023-01-26 Amir Said , Reza Pourreza , Hoang Le

To Compress or Not? Pushing the Frontier of Lossless GenAI Model Weights Compression with Exponent Concentration

The scaling of Generative AI (GenAI) models into the hundreds of billions of parameters makes low-precision computation indispensable for efficient deployment. We argue that the fundamental solution lies in developing low-precision…

Machine Learning · Computer Science 2025-10-06 Zeyu Yang , Tianyi Zhang , Jianwen Xie , Chuan Li , Zhaozhuo Xu , Anshumali Shrivastava

pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

Quantization-Aware Training from scratch has emerged as a promising approach for building efficient large language models (LLMs) with extremely low-bit weights (sub 2-bit), which can offer substantial advantages for edge deployment.…

Machine Learning · Computer Science 2026-02-27 Wenzheng Zhang , Bingzheng Liu , Yang Hu , Xiaoying Bai , Wentao Zhang , Bin Cui

EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs

Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective…

Artificial Intelligence · Computer Science 2024-03-06 Hanlin Tang , Yifu Sun , Decheng Wu , Kai Liu , Jianchen Zhu , Zhanhui Kang

Data Compression with Relative Entropy Coding

Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio…

Information Theory · Computer Science 2026-03-25 Gergely Flamich

Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression

For neural video codec, it is critical, yet challenging, to design an efficient entropy model which can accurately predict the probability distribution of the quantized latent representation. However, most existing video codecs directly use…

Image and Video Processing · Electrical Eng. & Systems 2022-07-14 Jiahao Li , Bin Li , Yan Lu