English
Related papers

Related papers: Accelerating Scientific Computations with Mixed Pr…

200 papers

Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…

Mathematical Software · Computer Science 2026-05-08 Tomonori Kouya

In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision…

Hardware Architecture · Computer Science 2021-01-29 Hamzah Abdel-Aziz , Ali Shafiee , Jong Hoon Shin , Ardavan Pedram , Joseph H. Hassoun

Motivated by the increasing interest in the posit numeric format, in this paper we evaluate the accuracy and efficiency of posit arithmetic in contrast to the traditional IEEE 754 32-bit floating-point (FP32) arithmetic. We first design and…

Hardware Architecture · Computer Science 2021-09-20 Stefan Dan Ciocirlan , Dumitrel Loghin , Lavanya Ramapantulu , Nicolae Tapus , Yong Meng Teo

With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during…

Machine Learning · Computer Science 2026-04-20 Juyoung Yun , Sol Choi , Francois Rameau , Byungkon Kang , Zhoulai Fu

Statistical computations are becoming increasingly important. These computations often need to be performed in log-space because probabilities become extremely small due to repeated multiplications. While using logarithms effectively…

Numerical Analysis · Mathematics 2025-09-16 Tiancheng Xu , Alan L. Cox , Scott Rixner

Recent research has shown that large language models (LLMs) can utilize low-precision floating point (FP) quantization to deliver high efficiency while maintaining original model accuracy. In particular, recent works have shown the…

Hardware Architecture · Computer Science 2025-06-05 Faraz Tahmasebi , Yian Wang , Benji Y. H. Huang , Hyoukjun Kwon

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest…

Hardware Architecture · Computer Science 2022-01-20 Francesco Sgherzi , Alberto Parravicini , Marco Domenico Santambrogio

In this paper we propose a mixed precision algorithm in the context of the semi-Lagrangian discontinuous Galerkin method. The performance of this approach is evaluated on a traditional dual socket workstation as well as on a Xeon Phi and an…

Mathematical Software · Computer Science 2018-08-14 Lukas Einkemmer

In this work, we provide energy-efficient architectural support for floating point accuracy. Our goal is to provide accuracy that is far greater than that provided by the processor's hardware floating point unit (FPU). Specifically, for…

Hardware Architecture · Computer Science 2013-09-30 Ralph Nathan , Bryan Anthonio , Shih-Lien Lu , Helia Naeimi , Daniel J. Sorin , Xiaobai Sun

General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-12 Fumiya Kono , Naohito Nakasato , Maho Nakata

The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show…

Hardware Architecture · Computer Science 2024-03-15 David Mallasén , Alberto A. Del Barrio , Manuel Prieto-Matias

Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they can lead to significant performance gains…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-01 Brian Curless , Michael Gowanlock

Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Aditya Kashi , Nicholson Koukpaizan , Hao Lu , Michael Matheson , Sarp Oral , Feiyi Wang

Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In…

Hardware Architecture · Computer Science 2021-10-26 Quentin Gallouédec

The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision…

Hardware Architecture · Computer Science 2026-04-10 Shubham Kumar , Vijay Pratap Sharma , Vaibhav Neema , Santosh Kumar Vishvakarma

Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length.…

Data Structures and Algorithms · Computer Science 2012-04-03 Youssef Bassil , Aziz Barbar

Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a great potential of performance gain and energy…

State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy. As a result, deep learning…

Machine Learning · Computer Science 2021-03-09 Pedram Zamirai , Jian Zhang , Christopher R. Aberger , Christopher De Sa

Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both…

Machine Learning · Computer Science 2023-09-25 Basile Lewandowski , Atli Kosson

In the early days of computing, severe memory constraints made it necessary to use lower floating-point precision. As hardware capabilities have advanced, modern systems, particularly in computational statistics and scientific computing,…

Computation · Statistics 2026-03-03 Mary Lai O. Salvana , Sameh Abdulah , Minwoo Kim , David Helmy , Ying Sun , Marc G. Genton
‹ Prev 1 2 3 10 Next ›