Related papers: Multigrid Methods using Block Floating Point Arith…

Training DNNs with Hybrid Block Floating Point

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full precision…

Machine Learning · Computer Science 2018-12-04 Mario Drumond , Tao Lin , Martin Jaggi , Babak Falsafi

BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation. One drawback however is the significant high computational complexity and memory consumption, which makes…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Yongqi Xu , Yujian Lee , Gao Yi , Bosheng Liu , Yucong Chen , Peng Liu , Jigang Wu , Xiaoming Chen , Yinhe Han

A Transprecision Floating-Point Platform for Ultra-Low Power Computing

In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy…

Hardware Architecture · Computer Science 2017-11-29 Giuseppe Tagliavini , Stefan Mach , Davide Rossi , Andrea Marongiu , Luca Benini

Block Format Error Bounds and Optimal Block Size Selection

The amounts of data that need to be transmitted, processed, and stored by the modern deep neural networks have reached truly enormous volumes in the last few years calling for the invention of new paradigms both in hardware and software…

Machine Learning · Computer Science 2022-11-08 Ilya Soloveychik , Ilya Lyubomirsky , Xin Wang , Sudeep Bhoja

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second…

Machine Learning · Computer Science 2021-11-01 Sai Qian Zhang , Bradley McDanel , H. T. Kung

BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models

Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory…

Hardware Architecture · Computer Science 2025-04-23 Xiaomeng Han , Yuan Cheng , Jing Wang , Junyang Lu , Hui Wang , X. x. Zhang , Ning Xu , Dawei Yang , Zhe Jiang

DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision…

Hardware Architecture · Computer Science 2026-04-10 Shubham Kumar , Vijay Pratap Sharma , Vaibhav Neema , Santosh Kumar Vishvakarma

Pushing the Limits of BFP on Narrow Precision LLM Inference

The substantial computational and memory demands of Large Language Models (LLMs) hinder their deployment. Block Floating Point (BFP) has proven effective in accelerating linear operations, a cornerstone of LLM workloads. However, as…

Hardware Architecture · Computer Science 2025-02-10 Hui Wang , Yuan Cheng , Xiaomeng Han , Zhengpeng Zhao , Dawei Yang , Zhe Jiang

Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision…

Hardware Architecture · Computer Science 2021-01-29 Hamzah Abdel-Aziz , Ali Shafiee , Jong Hoon Shin , Ardavan Pedram , Joseph H. Hassoun

Adaptive Block Floating-Point for Analog Deep Learning Hardware

Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an…

Machine Learning · Computer Science 2022-05-16 Ayon Basumallik , Darius Bunandar , Nicholas Dronen , Nicholas Harris , Ludmila Levkova , Calvin McCarter , Lakshmi Nair , David Walter , David Widemann

Acceleration of multi-component multiple-precision arithmetic with branch-free algorithms and SIMD vectorization

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…

Mathematical Software · Computer Science 2026-05-08 Tomonori Kouya

Multigrid with Linear Storage Complexity

As the discretization error for the solution of a partial differential equation (PDE) decreases, the precision required to store the corresponding coefficients naturally increases. Storing the solution's finite element coefficients…

Numerical Analysis · Mathematics 2025-11-25 Daniel Bauer , Nils Kohl , Stephen F. McCormick , Rasmus Tamstorf

Search Your Block Floating Point Scales!

Quantization has emerged as a standard technique for accelerating inference for generative models by enabling faster low-precision computations and reduced memory transfers. Recently, GPU accelerators have added first-class support for…

Machine Learning · Computer Science 2026-05-13 Tanmaey Gupta , Hayden Prairie , Xiaoxia Wu , Reyna Abhyankar , Qingyang Wu , Austin Silveria , Pragaash Ponnusamy , Jue Wang , Ben Athiwaratkun , Leon Song , Tri Dao , Daniel Y. Fu , Chris De Sa

Multigrid Primer: Basic Principles

The goal of this primer is to provide a relatively short exposition of the basics of multigrid methods, simplified by focusing on fundamental concepts in a variational setting. This is done by way of a quadratic energy minimization…

Numerical Analysis · Mathematics 2026-05-19 Stephen F. McCormick , Rasmus Tamstorf

Floating-Point Multiply-Add with Approximate Normalization for Low-Cost Matrix Engines

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point…

Hardware Architecture · Computer Science 2024-08-23 Kosmas Alexandridis , Christodoulos Peltekis , Dionysios Filippas , Giorgos Dimitrakopoulos

Hybrid multigrid methods for high-order discontinuous Galerkin discretizations

The present work develops hybrid multigrid methods for high-order discontinuous Galerkin discretizations of elliptic problems. Fast matrix-free operator evaluation on tensor product elements is used to devise a computationally efficient PDE…

Computational Physics · Physics 2020-06-24 Niklas Fehn , Peter Munch , Wolfgang A. Wall , Martin Kronbichler

Low-Precision Floating-Point Schemes for Neural Network Training

The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of…

Machine Learning · Computer Science 2018-04-17 Marc Ortiz , Adrián Cristal , Eduard Ayguadé , Marc Casas

Combined Integer and Variable Precision (CIVP) Floating Point Multiplication Architecture for FPGAs

In this paper, we propose an architecture/methodology for making FPGAs suitable for integer as well as variable precision floating point multiplication. The proposed work will of great importance in applications which requires variable…

Hardware Architecture · Computer Science 2007-11-19 Himanshu Thapliyal , Hamid R. Arabnia , Rajnish Bajpai , Kamal K. Sharma

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…

Numerical Analysis · Mathematics 2026-05-11 Ahmad Abdelfattah , Jack Dongarra , Massimiliano Fasi , Mantas Mikaitis , Françoise Tisseur

Accelerating Scientific Computations with Mixed Precision Algorithms

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and…

Mathematical Software · Computer Science 2015-05-13 Marc Baboulin , Alfredo Buttari , Jack Dongarra , Jakub Kurzak , Julie Langou , Julien Langou , Piotr Luszczek , Stanimire Tomov