中文
相关论文

相关论文: Correctly Rounded Functions For Vector Application…

200 篇论文

The IEEE 754-2008 standard recommends the correct rounding of some elementary functions. This requires to solve the Table Maker's Dilemma which implies a huge amount of CPU computation time. We consider in this paper accelerating such…

数学软件 · 计算机科学 2013-06-06 Pierre Fortin , Mourad Gouicem , Stef Graillat

This paper proposes a set of techniques to develop correctly rounded math libraries for 32-bit float and posit types. It enhances our RLibm approach that frames the problem of generating correctly rounded libraries as a linear programming…

数学软件 · 计算机科学 2021-04-12 Jay P. Lim , Santosh Nagarakatte

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…

分布式、并行与集群计算 · 计算机科学 2023-11-14 Pablo Vizcaino , Georgios Ieronymakis , Nikolaos Dimou , Vassilis Papaefstathiou , Jesus Labarta , Filippo Mantovani

The ever-increasing quest for data-level parallelism and variable precision in ubiquitous multimedia and Deep Neural Network (DNN) applications has motivated the use of Single Instruction, Multiple Data (SIMD) architectures. To alleviate…

硬件体系结构 · 计算机科学 2020-11-03 Zahra Ebrahimi , Salim Ullah , Akash Kumar

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…

分布式、并行与集群计算 · 计算机科学 2014-01-30 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

Many applications in Bayesian statistics are extremely computationally intensive. However, they are often inherently parallel, making them prime targets for modern massively parallel processors. Multi-core and distributed computing is…

统计计算 · 统计学 2021-05-10 David J. Warne , Scott A. Sisson , Christopher Drovandi

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

分布式、并行与集群计算 · 计算机科学 2018-12-12 Hang Liu , H. Howie Huang

The typical processors used for scientific computing have fixed-width data-paths. This implies that mathematical libraries were specifically developed to target each of these fixed precisions (binary16, binary32, binary64). However, to…

数学软件 · 计算机科学 2020-05-07 David Defour , Pablo de Oliveira Castro , Matei Istoan , Eric Petit

Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction,…

数学软件 · 计算机科学 2024-04-02 Lucas M. Dutton , Christopher Kumar Anand , Robert Enenkel , Silvia Melitta Müller

Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…

数据结构与算法 · 计算机科学 2025-06-05 Daniel Lemire

This paper proposes a novel set of trigonometric implementations which are 5x faster than the inbuilt C++ functions. The proposed implementation is also highly memory efficient requiring no precomputations of any kind. Benchmark comparisons…

数学软件 · 计算机科学 2025-02-18 Nikhil Dev Goyal , Parth Arora

We propose novel smooth approximations to the classical rounding function, suitable for differentiable optimization and machine learning applications. Our constructions are based on two approaches: (1) localized sigmoid window functions…

机器学习 · 计算机科学 2025-04-29 Stanislav Semenov

An improvement on precision of recursive function simulation in IEEE floating point standard is presented. It is shown that the average of rounding towards negative infinite and rounding towards positive infinite yields a better result than…

信号处理 · 电气工程与系统科学 2017-12-05 Melanie R. Silva , Erivelton G. Nepomuceno , Samir A. M. Martins

Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CR-LIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode.…

数学软件 · 计算机科学 2021-12-01 Jay P. Lim , Santosh Nagarakatte

Two essential problems in Computer Algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic.…

分布式、并行与集群计算 · 计算机科学 2020-04-27 Pierre Fortin , Ambroise Fleury , François Lemaire , Michael Monagan

Given the importance of floating-point~(FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and Posits). These representations do not have correctly rounded…

数学软件 · 计算机科学 2020-11-23 Jay P. Lim , Mridul Aanjaneya , John Gustafson , Santosh Nagarakatte

Our RLibm project generates a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference…

数学软件 · 计算机科学 2025-06-02 Sehyeok Park , Justin Kim , Santosh Nagarakatte

Although mixed precision arithmetic has recently garnered interest for training dense neural networks, many other applications could benefit from the speed-ups and lower storage cost if applied appropriately. The growing interest in…

数值分析 · 数学 2021-03-02 L. Minah Yang , Alyson Fox , Geoffrey Sanders

Basic computer arithmetic operations, such as $+$, $\times$, or $\div$ are correctly rounded, whilst mathematical functions such as $e^x$, $\ln(x)$, or $\sin(x)$ in general are not, meaning that separate implementations may provide…

数学软件 · 计算机科学 2025-09-09 Mantas Mikaitis , Tejaswa Rizyal

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the…

‹ 上一页 1 2 3 10 下一页 ›