Related papers: Computing Integer Powers in Floating-Point Arithme…
We study the multiple-precision addition of two positive floating-point numbers in base 2, with exact rounding, as specified in the MPFR library, i.e. where each number has its own precision. We show how the best possible complexity (up to…
In this work, approximate eight-bit floating-point operations performed using simple integer operations is discussed. For two-bit mantissa formats, faithful rounding can always be obtained for the considered operations. For all operations,…
We propose a novel floating-point encoding scheme that builds on prior work involving fixed-point encodings. We encode floating-point numbers using Two's Complement fixed-point mantissas and Two's Complement integral exponents. We used our…
Our goal is to find accurate and efficient algorithms, when they exist, for evaluating rational expressions containing floating point numbers, and for computing matrix factorizations (like LU and the SVD) of matrices with rational…
The study addresses the problem of precision in floating-point (FP) computations. A method for estimating the errors which affect intermediate and final results is proposed and a summary of many software simulations is discussed. The basic…
Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction,…
Floating-point arithmetic (FPA) is a mechanical representation of real arithmetic (RA), where each operation is replaced with a rounded counterpart. Various numerical properties can be verified by using SMT solvers that support the logic of…
Debugging accumulation of floating-point errors is hard; ideally, computer should track it automatically. Here we consider twofold approximation of an exact real with value + error pair of floating-point numbers. Normally, value + error sum…
Floating point arithmetic allows us to use a finite machine, the digital computer, to reach conclusions about models based on continuous mathematics. In this article we work in the other direction, that is, we present examples in which…
Probabilistic model checking computes probabilities and expected values related to designated behaviours of interest in Markov models. As a formal verification approach, it is applied to critical systems; thus we trust that probabilistic…
From a theoretical point of view, finding the solution set of a system of inequalities in only two variables is easy. However, if we want to get rigorous bounds on this set with floating point arithmetic, in all possible cases, then things…
Floating-point accumulation networks (FPANs) are key building blocks used in many floating-point algorithms, including compensated summation and double-double arithmetic. FPANs are notoriously difficult to analyze, and algorithms using…
Verification of programs using floating-point arithmetic is challenging on several accounts. One of the difficulties of reasoning about such programs is due to the peculiarities of floating-point arithmetic: rounding errors, infinities,…
Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of…
Mixed-precision computations are a hallmark of the current stage of AI, driving the progress in large language models towards efficient, locally deployable solutions. This article addresses the floating-point computation of…
The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point…
There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g.,…
We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient.…
Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…
Scientific computing programs often undergo aggressive compiler optimization to achieve high performance and efficient resource utilization. While performance is critical, we also need to ensure that these optimizations are correct. In this…