Related papers: Computing Integer Powers in Floating-Point Arithme…

The Generic Multiple-Precision Floating-Point Addition With Exact Rounding (as in the MPFR Library)

We study the multiple-precision addition of two positive floating-point numbers in base 2, with exact rounding, as specified in the MPFR library, i.e. where each number has its own precision. We show how the best possible complexity (up to…

Data Structures and Algorithms · Computer Science 2016-08-16 Vincent Lefèvre

On Approximate 8-bit Floating-Point Operations Using Integer Operations

In this work, approximate eight-bit floating-point operations performed using simple integer operations is discussed. For two-bit mantissa formats, faithful rounding can always be obtained for the considered operations. For all operations,…

Hardware Architecture · Computer Science 2024-06-27 Theodor Lindberg , Oscar Gustafsson

Efficient Floating-Point Arithmetic on Fault-Tolerant Quantum Computers

We propose a novel floating-point encoding scheme that builds on prior work involving fixed-point encodings. We encode floating-point numbers using Two's Complement fixed-point mantissas and Two's Complement integral exponents. We used our…

Quantum Physics · Physics 2025-10-24 José E. Cruz Serrallés , Oluwadara Ogunkoya , Do{g}a Murat Kürkçüo{g}lu , Nicholas Bornman , Norm M. Tubman , Anna Grassellino , Silvia Zorzetti , Riccardo Lattanzi

The complexity of accurate floating point computation

Our goal is to find accurate and efficient algorithms, when they exist, for evaluating rational expressions containing floating point numbers, and for computing matrix factorizations (like LU and the SVD) of matrices with rational…

Numerical Analysis · Mathematics 2025-10-20 James Demmel

Floating-Point Numbers with Error Estimates (revised)

The study addresses the problem of precision in floating-point (FP) computations. A method for estimating the errors which affect intermediate and final results is proposed and a summary of many software simulations is discussed. The basic…

Numerical Analysis · Computer Science 2012-01-31 Glauco Masotti

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction,…

Mathematical Software · Computer Science 2024-04-02 Lucas M. Dutton , Christopher Kumar Anand , Robert Enenkel , Silvia Melitta Müller

Approximate Translation from Floating-Point to Real-Interval Arithmetic

Floating-point arithmetic (FPA) is a mechanical representation of real arithmetic (RA), where each operation is replaced with a rounded counterpart. Various numerical properties can be verified by using SMT solvers that support the logic of…

Logic in Computer Science · Computer Science 2021-12-07 Daisuke Ishii , Takashi Tomita , Toshiaki Aoki

Twofold fast summation

Debugging accumulation of floating-point errors is hard; ideally, computer should track it automatically. Here we consider twofold approximation of an exact real with value + error pair of floating-point numbers. Normally, value + error sum…

Numerical Analysis · Computer Science 2014-01-06 Evgeny Latkin

Floating point numbers are real numbers

Floating point arithmetic allows us to use a finite machine, the digital computer, to reach conclusions about models based on continuous mathematics. In this article we work in the other direction, that is, we present examples in which…

Numerical Analysis · Mathematics 2017-10-05 Walter F. Mascarenhas

Correct Probabilistic Model Checking with Floating-Point Arithmetic

Probabilistic model checking computes probabilities and expected values related to designated behaviours of interest in Markov models. As a formal verification approach, it is applied to critical systems; thus we trust that probabilistic…

Logic in Computer Science · Computer Science 2021-10-19 Arnd Hartmanns

Solving systems of inequalities in two variables with floating point arithmetic

From a theoretical point of view, finding the solution set of a system of inequalities in only two variables is easy. However, if we want to get rigorous bounds on this set with floating point arithmetic, in all possible cases, then things…

Data Structures and Algorithms · Computer Science 2021-09-21 Walter F. Mascarenhas

Automatic Verification of Floating-Point Accumulation Networks

Floating-point accumulation networks (FPANs) are key building blocks used in many floating-point algorithms, including compensated summation and double-double arithmetic. FPANs are notoriously difficult to analyze, and algorithms using…

Numerical Analysis · Mathematics 2025-05-27 David K. Zhang , Alex Aiken

Correct Approximation of IEEE 754 Floating-Point Arithmetic for Program Verification

Verification of programs using floating-point arithmetic is challenging on several accounts. One of the difficulties of reasoning about such programs is due to the peculiarities of floating-point arithmetic: rounding errors, infinities,…

Programming Languages · Computer Science 2022-06-23 Roberto Bagnara , Abramo Bagnara , Fabio Biselli , Michele Chiari , Roberta Gori

Deterministic and Probabilistic Rounding Error Analysis for Mixed-Precision Arithmetic on Modern Computing Units

Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of…

Computation · Statistics 2024-12-02 Sahil Bhola , Karthik Duraisamy

LAMP: Look-Ahead Mixed-Precision Inference of Large Language Models

Mixed-precision computations are a hallmark of the current stage of AI, driving the progress in large language models towards efficient, locally deployable solutions. This article addresses the floating-point computation of…

Machine Learning · Computer Science 2026-05-08 Stanislav Budzinskiy , Marian Gloser , Tolunay Yilmaz , Ying Hong Tham , Yuanyi Lin , Wenyi Fang , Fan Wu , Philipp Petersen

Parallel Algorithms for Summing Floating-Point Numbers

The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point…

Data Structures and Algorithms · Computer Science 2016-05-19 Michael T. Goodrich , Ahmed Eldawy

Customizing Number Representation and Precision

There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g.,…

Hardware Architecture · Computer Science 2022-12-09 Olivier Sentieys , Daniel Menard

Wanted: Floating-Point Add Round-off Error instruction

We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient.…

Numerical Analysis · Computer Science 2016-03-03 Marat Dukhan , Richard Vuduc , Jason Riedy

Analysis of Floating-Point Matrix Multiplication Computed via Integer Arithmetic

Ootomo, Ozaki, and Yokota [Int. J. High Perform. Comput. Appl., 38 (2024), p. 297-313] have proposed a strategy to recast a floating-point matrix multiplication in terms of integer matrix products. The factors A and B are split into integer…

Numerical Analysis · Mathematics 2026-05-11 Ahmad Abdelfattah , Jack Dongarra , Massimiliano Fasi , Mantas Mikaitis , Françoise Tisseur

Towards Verified Compilation of Floating-point Optimization in Scientific Computing Programs

Scientific computing programs often undergo aggressive compiler optimization to achieve high performance and efficient resource utilization. While performance is critical, we also need to ensure that these optimizations are correct. In this…

Programming Languages · Computer Science 2025-09-12 Mohit Tekriwal , John Sarracino