Related papers: stdgpu: Efficient STL-like Data Structures on the …

RTGPU: Real-Time Computing with Graphics Processing Units

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

Hardware Architecture · Computer Science 2025-12-11 Atiyeh Gheibi-Fetrat , Amirsaeed Ahmadi-Tonekaboni , Farzam Koohi-Ronaghi , Pariya Hajipour , Sana Babayan-Vanestan , Fatemeh Fotouhi , Elahe Mortazavian-Farsani , Pouria Khajehpour-Dezfouli , Sepideh Safari , Shaahin Hessabi , Hamid Sarbazi-Azad

Contract-Based General-Purpose GPU Programming

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-14 Alexey Kolesnichenko , Christopher M. Poskitt , Sebastian Nanz , Bertrand Meyer

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

GPGPU Based Parallelized Client-Server Framework for Providing High Performance Computation Support

Parallel data processing has become indispensable for processing applications involving huge data sets. This brings into focus the Graphics Processing Units (GPUs) which emphasize on many-core computing. With the advent of General Purpose…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-22 Poorna Banerjee , Amit Dave

Contiguous Storage of Grid Data for Heterogeneous Computing

Structured Cartesian grids are a fundamental component in numerical simulations. Although these grids facilitate straightforward discretization schemes, their na\"{i}ve use in sparse domains leads to excessive memory overhead and…

Computational Engineering, Finance, and Science · Computer Science 2025-12-15 Fan Gu , Xiangyu Hu

GPGPU Computing

Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

Advanced Programming Platform for efficient use of Data Parallel Hardware

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

A Comprehensive Overview of GPU Accelerated Databases

Over the past decade, the landscape of data analytics has seen a notable shift towards heterogeneous architectures, particularly the integration of GPUs to enhance overall performance. In the realm of in-memory analytics, which often…

Databases · Computer Science 2024-06-21 Harshit Sharma , Anmol Sharma

GPGPU Processing in CUDA Architecture

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-02-21 Jayshree Ghorpade , Jitendra Parande , Madhura Kulkarni , Amit Bawaskar

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

Intra-node Memory Safe GPU Co-Scheduling

GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-14 Carlos Reano , Federico Silla , Dimitrios S. Nikolopoulos , Blesson Varghese

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-12 Ruben Laso , Diego Krupitza , Sascha Hunold

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC

Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-08 Wei-Chen Lin , Simon McIntosh-Smith , Tom Deakin

A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-01-11 Marek Blazewicz , Steven R. Brandt , Peter Diener , David M. Koppelman , Krzysztof Kurowski , Frank Löffler , Erik Schnetter , Jian Tao

CrystalGPU: Transparent and Efficient Utilization of GPU Power

General-purpose computing on graphics processing units (GPGPU) has recently gained considerable attention in various domains such as bioinformatics, databases and distributed computing. GPGPU is based on using the GPU as a co-processor…

Other Computer Science · Computer Science 2010-05-12 Abdullah Gharaibeh , Samer Al-Kiswany , Matei Ripeanu

Ripple : Simplified Large-Scale Computation on Heterogeneous Architectures with Polymorphic Data Layout

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-20 Robert Clucas , Philip Blakely , Nikolaos Nikiforakis

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime

GPUs are readily available in cloud computing and personal devices, but their use for data processing acceleration has been slowed down by their limited integration with common programming languages such as Python or Java. Moreover, using…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-20 Alberto Parravicini , Arnaud Delamare , Marco Arnaboldi , Marco D. Santambrogio

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting the compute and memory resources of a single GPU, and are demanding the move to multiple GPUs. However, the performance of these applications scales sub-linearly with…

Hardware Architecture · Computer Science 2020-08-11 Saiful A. Mojumder , Yifan Sun , Leila Delshadtehrani , Yenai Ma , Trinayan Baruah , José L. Abellán , John Kim , David Kaeli , Ajay Joshi