Related papers: Testing GPU Numerics: Finding Numerical Difference…
CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a…
Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering,…
GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pipeline designs based on architectures…
The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Graphics Processing Units (GPUs) are now present in supercomputers to mobile phones and tablets. GPUs are used for…
Many studies have focused on developing and improving auto-tuning algorithms for Nvidia Graphics Processing Units (GPUs), but the effectiveness and efficiency of these approaches on AMD devices have hardly been studied. This paper aims to…
Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained…
Numerical features of matrix multiplier hardware units in NVIDIA and AMD data centre GPUs have recently been studied. Features such as rounding, normalisation, and internal precision of the accumulators are of interest. In this paper, we…
Portability is critical to ensuring high productivity in developing and maintaining scientific software as the diversity in on-node hardware architectures increases. While several programming models provide portability for diverse GPU…
Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified…
In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in…
Graphics Processing Unit (GPU) computing is becoming an alternate computing platform for numerical simulations. However, it is not clear which numerical scheme will provide the highest computational efficiency for different types of…
Analysis of processing time and similarity of images generated between CPU and GPU architectures and sequential and parallel programming. For image processing a computer with AMD FX-8350 processor and an Nvidia GTX 960 Maxwell GPU was used,…
Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and…
Image Processing is a specialized area of Digital Signal Processing which contains various mathematical and algebraic operations such as matrix inversion, transpose of matrix, derivative, convolution, Fourier Transform etc. Operations like…
We have developed several autotuning benchmarks in CUDA that take into account performance-relevant source-code parameters and reach near peak-performance on various GPU architectures. We have used them during the development and evaluation…
GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for…
Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…
Hardware accelerators (such as Nvidia's CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this work, we focus on the use of Nvidia's Tesla GPU for…
In recent years, it has become increasingly common for high performance computers (HPC) to possess some level of heterogeneous architecture - typically in the form of GPU accelerators. In some machines these are isolated within a dedicated…
This paper presents the implementation of a HLLC finite volume solver using GPU technology for the solution of shallow water problems in two dimensions. It compares both CPU and GPU approaches for implementing all the solver's steps. The…