Related papers: Correctly Rounded Functions For Vector Application…
The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires to solve the Table Maker's Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such…
This paper proposes a set of techniques to develop correctly rounded math libraries for 32-bit float and posit types. It enhances our RLibm approach that frames the problem of generating correctly rounded libraries as a linear programming…
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…
The ever-increasing quest for data-level parallelism and variable precision in ubiquitous multimedia and Deep Neural Network (DNN) applications has motivated the use of Single Instruction, Multiple Data (SIMD) architectures. To alleviate…
Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…
Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…
With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…
The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to…
Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction,…
Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…
This paper proposes a novel set of trigonometric implementations which are 5x faster than the inbuilt C++ functions. The proposed implementation is also highly memory efficient requiring no precomputations of any kind. Benchmark comparisons…
We propose novel smooth approximations to the classical rounding function, suitable for differentiable optimization and machine learning applications. Our constructions are based on two approaches: (1) localized sigmoid window functions…
An improvement on precision of recursive function simulation in IEEE floating point standard is presented. It is shown that the average of rounding towards negative infinite and rounding towards positive infinite yields a better result than…
Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode.…
Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic.…
Given the importance of floating-point~(FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and Posits). These representations do not have correctly rounded…
Our RLibm project generates a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference…
Although mixed precision arithmetic has recently garnered interest for training dense neural networks, many other applications could benefit from the speed-ups and lower storage cost if applied appropriately. The growing interest in…
Basic computer arithmetic operations, such as $+$, $\times$, or $\div$ are correctly rounded, whilst mathematical functions such as $e^x$, $\ln(x)$, or $\sin(x)$ in general are not, meaning that separate implementations may provide…
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the…