Related papers: A Multi-GPU Programming Library for Real-Time Appl…

Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-26 Uldis Locans , Andreas Adelmann , Andreas Suter , Jannis Fischer , Werner Lustermann , Gunther Dissertori , Qiulin Wang

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

Multi-GPU Graph Analytics

We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , John D. Owens

The Fused Kernel Library: A C++ API to Develop Highly-Efficient GPU Libraries

Existing GPU libraries often struggle to fully exploit the parallel resources and on-chip memory (SRAM) of GPUs when chaining multiple GPU functions as individual kernels. While Kernel Fusion (KF) techniques like Horizontal Fusion (HF) and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-09 Oscar Amoros , Albert Andaluz , Johnny Nunez , Antonio J. Pena

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…

Computational Physics · Physics 2010-07-22 Benjamin Block , Peter Virnau , Tobias Preis

Accelerated Computing in Magnetic Resonance Imaging -- Real-Time Imaging Using Non-Linear Inverse Reconstruction

Purpose: To develop generic optimization strategies for image reconstruction using graphical processing units (GPUs) in magnetic resonance imaging (MRI) and to exemplarily report about our experience with a highly accelerated implementation…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-08 Sebastian Schaetz , Dirk Voit , Jens Frahm , Martin Uecker

RTGPU: Real-Time Computing with Graphics Processing Units

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

Hardware Architecture · Computer Science 2025-12-11 Atiyeh Gheibi-Fetrat , Amirsaeed Ahmadi-Tonekaboni , Farzam Koohi-Ronaghi , Pariya Hajipour , Sana Babayan-Vanestan , Fatemeh Fotouhi , Elahe Mortazavian-Farsani , Pouria Khajehpour-Dezfouli , Sepideh Safari , Shaahin Hessabi , Hamid Sarbazi-Azad

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

On the energy efficiency of sparse matrix computations on multi-GPU clusters

We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accelerators to enable large-scale scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-16 Massimo Bernaschi , Alessandro Celestini , Pasqua D'Ambra , Giorgio Richelli

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-14 Tong Dong Qiu , Andreas Thune , Vinicius Oliveira Martins , Markus Blatt , Alf Birger Rustad , Razvan Nane

stdgpu: Efficient STL-like Data Structures on the GPU

Tremendous advances in parallel computing and graphics hardware opened up several novel real-time GPU applications in the fields of computer vision, computer graphics as well as augmented reality (AR) and virtual reality (VR). Although…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-19 Patrick Stotko

Accelerating Matrix Multiplication: A Performance Comparison Between Multi-Core CPU and GPU

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

Developing a High Performance Software Library with MPI and CUDA for Matrix Computations

Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei

Effective GPU Sharing Under Compiler Guidance

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

GPU-Based Parallel Computing Methods for Medical Photoacoustic Image Reconstruction

Recent years have witnessed a rapid advancement in GPU technology, establishing it as a formidable high-performance parallel computing technology with superior floating-point computational capabilities compared to traditional CPUs. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-18 Xinyao Yi , Yuxin Qiao

Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data

While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-05 Xiaodong Yu , Viktor Nikitin , Daniel J. Ching , Selin Aslan , Doga Gursoy , Tekin Bicer

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-21 Jan Vanek , Josef Michalek , Josef Psutka

Multiscale Universal Interface: A Concurrent Framework for Coupling Heterogeneous Solvers

Concurrently coupled numerical simulations using heterogeneous solvers are powerful tools for modeling multiscale phenomena. However, major modifications to existing codes are often required to enable such simulations, posing significant…

Computational Physics · Physics 2015-05-18 Yu-Hang Tang , Shuhei Kudo , Xin Bian , Zhen Li , George E. Karniadakis

MPPI-Generic: A CUDA Library for Stochastic Trajectory Optimization

This paper introduces a new C++/CUDA library for GPU-accelerated stochastic optimization called MPPI-Generic. It provides implementations of Model Predictive Path Integral control, Tube-Model Predictive Path Integral Control, and Robust…

Mathematical Software · Computer Science 2026-02-26 Bogdan Vlahov , Jason Gibson , Manan Gandhi , Evangelos A. Theodorou