English
Related papers

Related papers: FPRaker: A Processing Element For Accelerating Neu…

200 papers

The widespread adoption of machine learning algorithms necessitates hardware acceleration to ensure efficient performance. This acceleration relies on custom matrix engines that operate on full or reduced-precision floating-point…

Hardware Architecture · Computer Science 2024-08-23 Kosmas Alexandridis , Christodoulos Peltekis , Dionysios Filippas , Giorgos Dimitrakopoulos

In todays world, high-power computing applications such as image processing, digital signal processing, graphics, and robotics require enormous computing power. These applications use matrix operations, especially matrix multiplication.…

Hardware Architecture · Computer Science 2019-10-29 Arish S , R. K. Sharma

In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy…

Hardware Architecture · Computer Science 2017-11-29 Giuseppe Tagliavini , Stefan Mach , Davide Rossi , Andrea Marongiu , Luca Benini

Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data…

Machine Learning · Computer Science 2022-03-15 Seock-Hwan Noh , Jahyun Koo , Seunghyun Lee , Jongse Park , Jaeha Kung

The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision…

Hardware Architecture · Computer Science 2026-04-10 Shubham Kumar , Vijay Pratap Sharma , Vaibhav Neema , Santosh Kumar Vishvakarma

Machine learning has recently gained traction as a way to overcome the slow accelerator generation and implementation process on an FPGA. It can be used to build performance and resource usage models that enable fast early-stage design…

Hardware Architecture · Computer Science 2022-10-04 Gagandeep Singh , Dionysios Diamantopoulos , Juan Gómez-Luna , Sander Stuijk , Henk Corporaal , Onur Mutlu

We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient.…

Numerical Analysis · Computer Science 2016-03-03 Marat Dukhan , Richard Vuduc , Jason Riedy

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second…

Machine Learning · Computer Science 2021-11-01 Sai Qian Zhang , Bradley McDanel , H. T. Kung

This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language…

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both…

Machine Learning · Computer Science 2023-09-25 Basile Lewandowski , Atli Kosson

In this work, we provide energy-efficient architectural support for floating point accuracy. Our goal is to provide accuracy that is far greater than that provided by the processor's hardware floating point unit (FPU). Specifically, for…

Hardware Architecture · Computer Science 2013-09-30 Ralph Nathan , Bryan Anthonio , Shih-Lien Lu , Helia Naeimi , Daniel J. Sorin , Xiaobai Sun

High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…

Hardware Architecture · Computer Science 2015-04-20 Syed Waqar Nabi , Saji N. Hameed , Wim Vanderbauwhede

Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for…

Hardware Architecture · Computer Science 2024-09-27 Sami Ben Ali , Silviu-Ioan Filip , Olivier Sentieys

The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently on Systolic Arrays (SA). To effectively trade off deep-learning training/inference quality with hardware cost, SA…

Hardware Architecture · Computer Science 2023-09-11 D. Filippas , C. Peltekis , G. Dimitrakopoulos , C. Nicopoulos

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision…

Hardware Architecture · Computer Science 2021-01-29 Hamzah Abdel-Aziz , Ali Shafiee , Jong Hoon Shin , Ardavan Pedram , Joseph H. Hassoun

Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a great potential of performance gain and energy…

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

Floating-point accumulation networks (FPANs) are key building blocks used in many floating-point algorithms, including compensated summation and double-double arithmetic. FPANs are notoriously difficult to analyze, and algorithms using…

Numerical Analysis · Mathematics 2025-05-27 David K. Zhang , Alex Aiken

Accumulation-based operations, such as summation and matrix multiplication, are fundamental to numerous computational domains. However, their accumulation orders are often undocumented in existing software and hardware implementations,…

Numerical Analysis · Mathematics 2025-07-02 Peichen Xie , Yanjie Gao , Yang Wang , Jilong Xue
‹ Prev 1 2 3 10 Next ›