English
Related papers

Related papers: EFloat: Entropy-coded Floating Point Format for Co…

200 papers

Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in…

Machine Learning · Computer Science 2020-02-12 Thierry Tambe , En-Yu Yang , Zishen Wan , Yuntian Deng , Vijay Janapa Reddi , Alexander Rush , David Brooks , Gu-Yeon Wei

This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language…

Large-scale AI models, such as Large Language Models (LLMs) and Diffusion Models (DMs), have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce…

Machine Learning · Computer Science 2026-01-05 Tianyi Zhang , Mohsen Hariri , Shaochen Zhong , Vipin Chaudhary , Yang Sui , Xia Hu , Anshumali Shrivastava

Over the last few years, machine learning unlocked previously infeasible features for compression, such as providing guarantees for users' privacy or tailoring compression to specific data statistics (e.g., satellite images or audio…

Information Theory · Computer Science 2026-03-25 Gergely Flamich

In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy…

Hardware Architecture · Computer Science 2017-11-29 Giuseppe Tagliavini , Stefan Mach , Davide Rossi , Andrea Marongiu , Luca Benini

The scaling of Generative AI (GenAI) models into the hundreds of billions of parameters makes low-precision computation indispensable for efficient deployment. We argue that the fundamental solution lies in developing low-precision…

Machine Learning · Computer Science 2025-10-06 Zeyu Yang , Tianyi Zhang , Jianwen Xie , Chuan Li , Zhaozhuo Xu , Anshumali Shrivastava

Large Language Models (LLMs) achieve strong performance across tasks, but face storage and compute challenges on edge devices. We propose EntroLLM, a compression framework combining mixed quantization and entropy coding to reduce storage…

Machine Learning · Computer Science 2026-05-05 Arnab Sanyal , Gourav Datta , Prithwish Mukherjee , Sandeep P. Chinchali , Michael Orshansky

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy. As a result, deep learning…

Machine Learning · Computer Science 2021-03-09 Pedram Zamirai , Jian Zhang , Christopher R. Aberger , Christopher De Sa

This preliminary white paper proposes a novel 8-bit floating-point data format HiFloat8 (abbreviated as HiF8) for deep learning. HiF8 features tapered precision. For normal value encoding, it provides 7 exponent values with 3-bit mantissa,…

Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of the NN models and improving the energy efficiency of the underlying hardware architectures.…

Hardware Architecture · Computer Science 2024-10-28 Luca Bertaccini , Gianna Paulin , Tim Fischer , Stefan Mach , Luca Benini

Post-training compression is currently divided into two contrasting regimes. On the one hand, fast, data-free, and model-agnostic methods (e.g., NF4 or HQQ) offer maximum accessibility but suffer from functional collapse at extreme…

Machine Learning · Computer Science 2026-02-02 Patrick Putzky , Martin Genzel , Mattes Mollenhauer , Sebastian Schulze , Thomas Wollmann , Stefan Dietzel

We present an $\epsilon$-bounded compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit…

Machine Learning · Computer Science 2026-03-27 Han Xiao

Modern deep neural network (DNN) models generally require a huge amount of weight and activation values to achieve good inference outcomes. Those data inevitably demand a massive off-chip memory capacity/bandwidth, and the situation gets…

Machine Learning · Computer Science 2021-04-27 Cheng-Wei Huang , Tim-Wei Chen , Juinn-Dar Huang

Federated learning (FL) enables collaborative model training without exposing clients' private data, but its deployment is often constrained by the communication cost of transmitting gradients between clients and the central server,…

Machine Learning · Computer Science 2025-11-11 Zhijing Ye , Sheng Di , Jiamin Wang , Zhiqing Zhong , Zhaorui Zhang , Xiaodong Yu

As deep learning models grow and deployment becomes more widespread, reducing the storage and transmission costs of neural network weights has become increasingly important. While prior work such as ZipNN has shown that lossless compression…

Machine Learning · Computer Science 2025-08-28 Anat Heilper , Doron Singer

As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted…

Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Shivam Aggarwal , Hans Jakob Damsgaard , Alessandro Pappalardo , Giuseppe Franco , Thomas B. Preußer , Michaela Blott , Tulika Mitra

We propose a new complex block floating-point format to reduce implementation complexity. The new format achieves wordlength reduction by sharing an exponent across the block of samples, and uses box encoding for the shared exponent to…

Information Theory · Computer Science 2017-10-26 Yeong Foong Choo , Brian L. Evans , Alan Gatherer

The increasing computational and memory demands of large language models (LLMs) necessitate innovative approaches to optimize resource usage without compromising performance. This paper leverages microscaling floating-point formats, a novel…

Neural and Evolutionary Computing · Computer Science 2025-10-03 Marco Cococcioni , Dario Pagani , Federico Rossi

Fault injection attacks on embedded neural network models have been shown as a potent threat. Numerous works studied resilience of models from various points of view. As of now, there is no comprehensive study that would evaluate the…

Cryptography and Security · Computer Science 2026-04-14 Jakub Breier , Štefan Kučerák , Xiaolu Hou
‹ Prev 1 2 3 10 Next ›