Related papers: Testing GPU Numerics: Finding Numerical Difference…

A Performance Comparison of CUDA and OpenCL

CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a…

Performance · Computer Science 2011-05-17 Kamran Karimi , Neil G. Dickson , Firas Hamze

A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations

Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering,…

Instrumentation and Methods for Astrophysics · Physics 2015-06-15 Roberto Capuzzo-Dolcetta , Mario Spera

Analyzing Modern NVIDIA GPU cores

GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures…

Hardware Architecture · Computer Science 2025-10-30 Rodrigo Huerta , Mojtaba Abaie Shoushtary , José-Lorenzo Cruz , Antonio González

Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Yehia Arafa , Abdel-Hameed Badawy , Gopinath Chennupati , Nandakishore Santhi , Stephan Eidenbenz

Bringing Auto-tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs

Many studies have focused on developing and improving auto-tuning algorithms for Nvidia Graphics Processing Units (GPUs), but the effectiveness and efficiency of these approaches on AMD devices have hardly been studied. This paper aims to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Milo Lurati , Stijn Heldens , Alessio Sclocco , Ben van Werkhoven

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network Compiler

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained…

Neural and Evolutionary Computing · Computer Science 2018-01-19 Yu Ji , YouHui Zhang , WenGuang Chen , Yuan Xie

Generalized Methodology for Determining Numerical Features of Hardware Floating-Point Matrix Multipliers: Part I

Numerical features of matrix multiplier hardware units in NVIDIA and AMD data centre GPUs have recently been studied. Features such as rounding, normalisation, and internal precision of the accumulators are of interest. In this paper, we…

Hardware Architecture · Computer Science 2025-10-21 Faizan A Khattak , Mantas Mikaitis

Taking GPU Programming Models to Task for Performance Portability

Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-08 Joshua H. Davis , Pranav Sivaraman , Joy Kitson , Konstantinos Parasyris , Harshitha Menon , Isaac Minn , Giorgis Georgakoudis , Abhinav Bhatele

GPGPU Computing

Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

Evaluating polynomials in several variables and their derivatives on a GPU computing processor

In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in…

Mathematical Software · Computer Science 2012-01-04 Jan Verschelde , Genady Yoffe

A Comparative Study of 2D Numerical Methods with GPU Computing

Graphics Processing Unit (GPU) computing is becoming an alternate computing platform for numerical simulations. However, it is not clear which numerical scheme will provide the highest computational efficiency for different types of…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-07 Ben J. Zimmerman , Jonathan D. Regele , Bong Wie

Performance evaluation in the reconstruction of 2D images of computed tomography using massively parallel programming CUDA

Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used,…

Medical Physics · Physics 2022-02-10 Alexssandro Ferreira Cordeiro , Pedro Luiz de Paula Filho , Hamilton Pereira da Silva , Arnaldo Candido Junior , Edresson Casanova , Jandrei Sartori Spancerski

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and…

Performance · Computer Science 2017-11-07 G. D. Balogh , I. Z. Reguly , G. R. Mudalige

Performance Comparison Between OpenCV Built in CPU and GPU Functions on Image Processing Operations

Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-24 Batuhan Hangün , Önder Eyecioğlu

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures

We have developed several autotuning benchmarks in CUDA that take into account performance-relevant source-code parameters and reach near peak-performance on various GPU architectures. We have used them during the development and evaluation…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-11 Jiří Filipovič , Jana Hozzová , Amin Nezarat , Jaroslav Oľha , Filip Petrovič

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-14 Lingqi Zhang , Mohamed Wahib , Haoyu Zhang , Satoshi Matsuoka

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA

Hardware accelerators (such as Nvidia's CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this work, we focus on the use of Nvidia's Tesla GPU for…

Computational Physics · Physics 2010-06-04 Rakesh Ginjupalli , Gaurav Khanna

Development and performance of a HemeLB GPU code for human-scale blood flow simulation

In recent years, it has become increasingly common for high performance computers (HPC) to possess some level of heterogeneous architecture - typically in the form of GPU accelerators. In some machines these are isolated within a dedicated…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-19 I. Zacharoudiou , J. W. S. McCullough , P. V. Coveney

A GPU-enabled finite volume solver for large shallow water simulations

This paper presents the implementation of a HLLC finite volume solver using GPU technology for the solution of shallow water problems in two dimensions. It compares both CPU and GPU approaches for implementing all the solver's steps. The…

Computational Engineering, Finance, and Science · Computer Science 2018-07-03 Fabrice Zaoui