Related papers: Implementation of float-float operators on graphic…

Vectorization of Multibyte Floating Point Data Formats

We propose a scheme for reduced-precision representation of floating point data on a continuum between IEEE-754 floating point types. Our scheme enables the use of lower precision formats for a reduction in storage space requirements and…

Mathematical Software · Computer Science 2017-01-31 Andrew Anderson , David Gregg

Caract\'{e}ristiques arithm\'{e}tiques des processeurs graphiques

Les unit\'{e}s graphiques (Graphic Processing Units- GPU) sont d\'{e}sormais des processeurs puissants et flexibles. Les derni\`{e}res g\'{e}n\'{e}rations de GPU contiennent des unit\'{e}s programmables de traitement des sommets (vertex…

Mathematical Software · Computer Science 2007-05-23 Marc Daumas , Guillaume Da Graça , David Defour

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD)…

Computational Physics · Physics 2011-04-08 Peter H. Colberg , Felix Höfling

Performance and Numerical Aspects of Decompositional Factorizations with FP64 Floating-Point Emulation in INT8

Mixing precisions for performance has been an ongoing trend as the modern hardware accelerators started including new, and mostly lower-precision, data formats. The advantage of using them is a great potential of performance gain and energy…

Numerical Analysis · Mathematics 2025-09-30 Piotr Luszczek , Vijay Gadepally , LaToya Anderson , William Arcand , David Bestor , William Bergeron , Alex Bonn , Daniel J. Burrill , Chansup Byun , Michael Houle , Matthew Hubbell , Hayden Jananthan , Michael Jones , Peter Michaleas , Guillermo Morales , Julia Mullen , Andrew Prout , Albert Reuther , Antonio Rosa , Charles Yee , Jeremy Kepner

Benchmarking and Implementation of Probability-Based Simulations on Programmable Graphics Cards

The latest Graphics Processing Units (GPUs) are reported to reach up to 200 billion floating point operations per second (200 Gflops) and to have price performance of 0.1 cents per M flop. These facts raise great interest in the…

Graphics · Computer Science 2016-08-31 S. Tomov , M. McGuigan , R. Bennett , G. Smith , J. Spiletic

Recycled Error Bits: Energy-Efficient Architectural Support for Higher Precision Floating Point

In this work, we provide energy-efficient architectural support for floating point accuracy. Our goal is to provide accuracy that is far greater than that provided by the processor's hardware floating point unit (FPU). Specifically, for…

Hardware Architecture · Computer Science 2013-09-30 Ralph Nathan , Bryan Anthonio , Shih-Lien Lu , Helia Naeimi , Daniel J. Sorin , Xiaobai Sun

A Transprecision Floating-Point Platform for Ultra-Low Power Computing

In modern low-power embedded platforms, floating-point (FP) operations emerge as a major contributor to the energy consumption of compute-intensive applications with large dynamic range. Experimental evidence shows that 50% of the energy…

Hardware Architecture · Computer Science 2017-11-29 Giuseppe Tagliavini , Stefan Mach , Davide Rossi , Andrea Marongiu , Luca Benini

A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units

The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central…

Computation · Statistics 2012-07-24 Mark Franey , Pritam Ranjan , Hugh Chipman

QCD on GPUs: cost effective supercomputing

The exponential growth of floating point power in graphics processing units (GPUs), together with their low cost, has given rise to an attractive platform upon which to deploy lattice QCD calculations. GPUs are essentially many (O(100))…

High Energy Physics - Lattice · Physics 2010-11-05 M. A. Clark

Mixed precision in Graphics Processing Unit

Modern graphics computing units (GPUs) are designed and optimized to perform highly parallel numerical calculations. This parallelism has enabled (and promises) significant advantages, both in terms of energy performance and calculation. In…

Hardware Architecture · Computer Science 2021-10-26 Quentin Gallouédec

Graphic processors to speed-up simulations for the design of high performance solar receptors

Graphics Processing Units (GPUs) are now powerful and flexible systems adapted and used for other purposes than graphics calculations (General Purpose computation on GPU -- GPGPU). We present here a prototype to be integrated into…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-06-13 Sylvain Collange , Marc Daumas , David Defour

Efficient Floating-Point Givens Rotation Unit

High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce…

Hardware Architecture · Computer Science 2020-10-26 Javier Hormigo , Sergio D. Muñoz

Graphics Processing Units and High-Dimensional Optimization

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…

Computation · Statistics 2015-03-13 Hua Zhou , Kenneth Lange , Marc A. Suchard

Accelerating Scientific Computations with Mixed Precision Algorithms

On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and…

Mathematical Software · Computer Science 2015-05-13 Marc Baboulin , Alfredo Buttari , Jack Dongarra , Jakub Kurzak , Julie Langou , Julien Langou , Piotr Luszczek , Stanimire Tomov

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

Enabling predictable parallelism in single-GPU systems with persistent CUDA threads

Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-03 Paolo Burgio

Formally verified 32- and 64-bit integer division using double-precision floating-point arithmetic

Some recent processors are not equipped with an integer division unit. Compilers then implement division by a call to a special function supplied by the processor designers, which implements division by a loop producing one bit of quotient…

Logic in Computer Science · Computer Science 2022-07-19 David Monniaux , Alice Pain

Proposal for a High Precision Tensor Processing Unit

This whitepaper proposes the design and adoption of a new generation of Tensor Processing Unit which has the performance of Google's TPU, yet performs operations on wide precision data. The new generation TPU is made possible by…

Hardware Architecture · Computer Science 2017-06-13 Eric B. Olsen

Expressive Power of ReLU and Step Networks under Floating-Point Operations

The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural…

Machine Learning · Computer Science 2024-07-17 Yeachan Park , Geonho Hwang , Wonyeol Lee , Sejun Park

A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest…

Hardware Architecture · Computer Science 2022-01-20 Francesco Sgherzi , Alberto Parravicini , Marco Domenico Santambrogio