English
Related papers

Related papers: Multigrid Methods using Block Floating Point Arith…

200 papers

The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full precision…

Machine Learning · Computer Science 2018-12-04 Mario Drumond , Tao Lin , Martin Jaggi , Babak Falsafi

Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation. One drawback however is the significant high computational complexity and memory consumption, which makes…

Computer Vision and Pattern Recognition · Computer Science 2024-09-26 Yongqi Xu , Yujian Lee , Gao Yi , Bosheng Liu , Yucong Chen , Peng Liu , Jigang Wu , Xiaoming Chen , Yinhe Han

In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy…

Hardware Architecture · Computer Science 2017-11-29 Giuseppe Tagliavini , Stefan Mach , Davide Rossi , Andrea Marongiu , Luca Benini

The amounts of data that need to be transmitted, processed, and stored by the modern deep neural networks have reached truly enormous volumes in the last few years calling for the invention of new paradigms both in hardware and software…

Machine Learning · Computer Science 2022-11-08 Ilya Soloveychik , Ilya Lyubomirsky , Xin Wang , Sudeep Bhoja

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second…

Machine Learning · Computer Science 2021-11-01 Sai Qian Zhang , Bradley McDanel , H. T. Kung

Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory…

Hardware Architecture · Computer Science 2025-04-23 Xiaomeng Han , Yuan Cheng , Jing Wang , Junyang Lu , Hui Wang , X. x. Zhang , Ning Xu , Dawei Yang , Zhe Jiang

The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision…

Hardware Architecture · Computer Science 2026-04-10 Shubham Kumar , Vijay Pratap Sharma , Vaibhav Neema , Santosh Kumar Vishvakarma

The substantial computational and memory demands of Large Language Models (LLMs) hinder their deployment. Block Floating Point (BFP) has proven effective in accelerating linear operations, a cornerstone of LLM workloads. However, as…

Hardware Architecture · Computer Science 2025-02-10 Hui Wang , Yuan Cheng , Xiaomeng Han , Zhengpeng Zhao , Dawei Yang , Zhe Jiang

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision…

Hardware Architecture · Computer Science 2021-01-29 Hamzah Abdel-Aziz , Ali Shafiee , Jong Hoon Shin , Ardavan Pedram , Joseph H. Hassoun

Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an…

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…

Mathematical Software · Computer Science 2026-05-08 Tomonori Kouya

As the discretization error for the solution of a partial differential equation (PDE) decreases, the precision required to store the corresponding coefficients naturally increases. Storing the solution's finite element coefficients…

Numerical Analysis · Mathematics 2025-11-25 Daniel Bauer , Nils Kohl , Stephen F. McCormick , Rasmus Tamstorf

Quantization has emerged as a standard technique for accelerating inference for generative models by enabling faster low-precision computations and reduced memory transfers. Recently, GPU accelerators have added first-class support for…

The goal of this primer is to provide a relatively short exposition of the basics of multigrid methods, simplified by focusing on fundamental concepts in a variational setting. This is done by way of a quadratic energy minimization…

Numerical Analysis · Mathematics 2026-05-19 Stephen F. McCormick , Rasmus Tamstorf

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point…

Hardware Architecture · Computer Science 2024-08-23 Kosmas Alexandridis , Christodoulos Peltekis , Dionysios Filippas , Giorgos Dimitrakopoulos

The present work develops hybrid multigrid methods for high-order discontinuous Galerkin discretizations of elliptic problems. Fast matrix-free operator evaluation on tensor product elements is used to devise a computationally efficient PDE…

Computational Physics · Physics 2020-06-24 Niklas Fehn , Peter Munch , Wolfgang A. Wall , Martin Kronbichler

The use of low-precision fixed-point arithmetic along with stochastic rounding has been proposed as a promising alternative to the commonly used 32-bit floating point arithmetic to enhance training neural networks training in terms of…

Machine Learning · Computer Science 2018-04-17 Marc Ortiz , Adrián Cristal , Eduard Ayguadé , Marc Casas

In this paper, we propose an architecture/methodology for making FPGAs suitable for integer as well as variable precision floating point multiplication. The proposed work will of great importance in applications which requires variable…

Hardware Architecture · Computer Science 2007-11-19 Himanshu Thapliyal , Hamid R. Arabnia , Rajnish Bajpai , Kamal K. Sharma

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…

Numerical Analysis · Mathematics 2026-05-11 Ahmad Abdelfattah , Jack Dongarra , Massimiliano Fasi , Mantas Mikaitis , Françoise Tisseur

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and…

Mathematical Software · Computer Science 2015-05-13 Marc Baboulin , Alfredo Buttari , Jack Dongarra , Jakub Kurzak , Julie Langou , Julien Langou , Piotr Luszczek , Stanimire Tomov
‹ Prev 1 2 3 10 Next ›