Related papers: Correctly Rounded Functions For Vector Application…

GPU-accelerated generation of correctly-rounded elementary functions

The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires to solve the Table Maker's Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such…

Mathematical Software · Computer Science 2013-06-06 Pierre Fortin , Mourad Gouicem , Stef Graillat

RLIBM-32: High Performance Correctly Rounded Math Libraries for 32-bit Floating Point Representations

This paper proposes a set of techniques to develop correctly rounded math libraries for 32-bit float and posit types. It enhances our RLibm approach that frames the problem of generating correctly rounded libraries as a linear programming…

Mathematical Software · Computer Science 2021-04-12 Jay P. Lim , Santosh Nagarakatte

Short reasons for long vectors in HPC CPUs: a study based on RISC-V

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Pablo Vizcaino , Georgios Ieronymakis , Nikolaos Dimou , Vassilis Papaefstathiou , Jesus Labarta , Filippo Mantovani

SIMDive: Approximate SIMD Soft Multiplier-Divider for FPGAs with Tunable Accuracy

The ever-increasing quest for data-level parallelism and variable precision in ubiquitous multimedia and Deep Neural Network (DNN) applications has motivated the use of Single Instruction, Multiple Data (SIMD) architectures. To alleviate…

Hardware Architecture · Computer Science 2020-11-03 Zahra Ebrahimi , Salim Ullah , Akash Kumar

Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-30 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

Vector operations for accelerating expensive Bayesian computations -- a tutorial guide

Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…

Computation · Statistics 2021-05-10 David J. Warne , Scott A. Sisson , Christopher Drovandi

SIMD-X: Programming and Processing of Graph Algorithms on GPUs

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

Custom-Precision Mathematical Library Explorations for Code Profiling and Optimization

The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to…

Mathematical Software · Computer Science 2020-05-07 David Defour , Pablo de Oliveira Castro , Matei Istoan , Eric Petit

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction,…

Mathematical Software · Computer Science 2024-04-02 Lucas M. Dutton , Christopher Kumar Anand , Robert Enenkel , Silvia Melitta Müller

Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…

Data Structures and Algorithms · Computer Science 2025-06-05 Daniel Lemire

A Novel SIMD-Optimized Implementation for Fast and Memory-Efficient Trigonometric Computation

This paper proposes a novel set of trigonometric implementations which are 5x faster than the inbuilt C++ functions. The proposed implementation is also highly memory efficient requiring no precomputations of any kind. Benchmark comparisons…

Mathematical Software · Computer Science 2025-02-18 Nikhil Dev Goyal , Parth Arora

Smooth Approximations of the Rounding Function

We propose novel smooth approximations to the classical rounding function, suitable for differentiable optimization and machine learning applications. Our constructions are based on two approaches: (1) localized sigmoid window functions…

Machine Learning · Computer Science 2025-04-29 Stanislav Semenov

Note on improvement precision of recursive function simulation in floating point standard

An improvement on precision of recursive function simulation in IEEE floating point standard is presented. It is shown that the average of rounding towards negative infinite and rounding towards positive infinite yields a better result than…

Signal Processing · Electrical Eng. & Systems 2017-12-05 Melanie R. Silva , Erivelton G. Nepomuceno , Samir A. M. Martins

RLIBM-ALL: A Novel Polynomial Approximation Method to Produce Correctly Rounded Results for Multiple Representations and Rounding Modes

Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode.…

Mathematical Software · Computer Science 2021-12-01 Jay P. Lim , Santosh Nagarakatte

High performance SIMD modular arithmetic for polynomial evaluation

Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-27 Pierre Fortin , Ambroise Fleury , François Lemaire , Michael Monagan

A Novel Approach to Generate Correctly Rounded Math Libraries for New Floating Point Representations

Given the importance of floating-point~(FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and Posits). These representations do not have correctly rounded…

Mathematical Software · Computer Science 2020-11-23 Jay P. Lim , Mridul Aanjaneya , John Gustafson , Santosh Nagarakatte

RLibm-MultiRound: Correctly Rounded Math Libraries Without Worrying about the Application's Rounding Mode

Our RLibm project generates a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference…

Mathematical Software · Computer Science 2025-06-02 Sehyeok Park , Justin Kim , Santosh Nagarakatte

Rounding Error Analysis of Mixed Precision Block Householder QR Algorithms

Although mixed precision arithmetic has recently garnered interest for training dense neural networks, many other applications could benefit from the speed-ups and lower storage cost if applied appropriately. The growing interest in…

Numerical Analysis · Mathematics 2021-03-02 L. Minah Yang , Alyson Fox , Geoffrey Sanders

Accuracy of Mathematical Functions in Julia

Basic computer arithmetic operations, such as $+$, $\times$, or $\div$ are correctly rounded, whilst mathematical functions such as $e^x$, $\ln(x)$, or $\sin(x)$ in general are not, meaning that separate implementations may provide…

Mathematical Software · Computer Science 2025-09-09 Mantas Mikaitis , Tejaswa Rizyal

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the…

Mathematical Software · Computer Science 2020-07-15 Ahmad Abdelfattah , Hartwig Anzt , Erik G. Boman , Erin Carson , Terry Cojean , Jack Dongarra , Mark Gates , Thomas Grützmacher , Nicholas J. Higham , Sherry Li , Neil Lindquist , Yang Liu , Jennifer Loe , Piotr Luszczek , Pratik Nayak , Sri Pranesh , Siva Rajamanickam , Tobias Ribizel , Barry Smith , Kasia Swirydowicz , Stephen Thomas , Stanimire Tomov , Yaohung M. Tsai , Ichitaro Yamazaki , Urike Meier Yang