Related papers: Accelerating Scientific Computations with Mixed Pr…
Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…
In this paper, we propose a mixed-precision convolution unit architecture which supports different integer and floating point (FP) precisions. The proposed architecture is based on low-bit inner product units and realizes higher precision…
Motivated by the increasing interest in the posit numeric format, in this paper we evaluate the accuracy and efficiency of posit arithmetic in contrast to the traditional IEEE 754 32-bit floating-point (FP32) arithmetic. We first design and…
With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during…
Statistical computations are becoming increasingly important. These computations often need to be performed in log-space because probabilities become extremely small due to repeated multiplications. While using logarithms effectively…
Recent research has shown that large language models (LLMs) can utilize low-precision floating point (FP) quantization to deliver high efficiency while maintaining original model accuracy. In particular, recent works have shown the…
Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest…
In this paper we propose a mixed precision algorithm in the context of the semi-Lagrangian discontinuous Galerkin method. The performance of this approach is evaluated on a traditional dual socket workstation as well as on a Xeon Phi and an…
In this work, we provide energy-efficient architectural support for floating point accuracy. Our goal is to provide accuracy that is far greater than that provided by the processor's hardware floating point unit (FPU). Specifically, for…
General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application…
The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show…
Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they can lead to significant performance gains…
Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by…
Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In…
The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) units. This paper presents a dual-precision…
Today's PCs can directly manipulate numbers not longer than 64 bits because the size of the CPU registers and the data-path are limited. Consequently, arithmetic operations such as addition, can only be performed on numbers of that length.…
Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a great potential of performance gain and energy…
State-of-the-art generic low-precision training algorithms use a mix of 16-bit and 32-bit precision, creating the folklore that 16-bit hardware compute units alone are not enough to maximize model accuracy. As a result, deep learning…
Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both…
In the early days of computing, severe memory constraints made it necessary to use lower floating-point precision. As hardware capabilities have advanced, modern systems, particularly in computational statistics and scientific computing,…