Related papers: Descend: A Safe GPU Systems Programming Language

GPGPU Computing

Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-09 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

GPGPU Processing in CUDA Architecture

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-02-21 Jayshree Ghorpade , Jitendra Parande , Madhura Kulkarni , Amit Bawaskar

GPU Scripting and Code Generation with PyCUDA

High-level scripting languages are in many ways polar opposites to GPUs. GPUs are highly parallel, subject to hardware subtleties, and designed for maximum throughput, and they offer a tremendous advance in the performance achievable for a…

Software Engineering · Computer Science 2013-04-23 Andreas Klöckner , Nicolas Pinto , Bryan Catanzaro , Yunsup Lee , Paul Ivanov , Ahmed Fasih

Exploring Memory Persistency Models for GPUs

Given its high integration density, high speed, byte addressability, and low standby power, non-volatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-30 Zhen Lin , Mohammad Alshboul , Yan Solihin , Huiyang Zhou

Contract-Based General-Purpose GPU Programming

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. An obstacle to widespread adoption, however, is the…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-14 Alexey Kolesnichenko , Christopher M. Poskitt , Sebastian Nanz , Bertrand Meyer

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

CUDA Leaks: Information Leakage in GPU Architectures

Graphics Processing Units (GPUs) are deployed on most present server, desktop, and even mobile platforms. Nowadays, a growing number of applications leverage the high parallelism offered by this architecture to speed-up general purpose…

Cryptography and Security · Computer Science 2016-02-29 Roberto Di Pietro , Flavio Lombardi , Antonio Villani

Challenges and Design Considerations for Finding CUDA Bugs Through GPU-Native Fuzzing

Modern computing is shifting from homogeneous CPU-centric systems to heterogeneous systems with closely integrated CPUs and GPUs. While the CPU software stack has benefited from decades of memory safety hardening, the GPU software stack…

Cryptography and Security · Computer Science 2026-03-09 Mingkai Li , Joseph Devietti , Suman Jana , Tanvir Ahmed Khan

Modular GPU Programming with Typed Perspectives

To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of many threads acting in concert to…

Programming Languages · Computer Science 2025-11-18 Manya Bansal , Daniel Sainati , Joseph W. Cutler , Saman Amarasinghe , Jonathan Ragan-Kelley

Gradient descent procedure for solving linear programming relaxations of combinatorial optimization problems in parallel mode on extra large scale

Linear programming (LP) relaxation is a standard technique for solving hard combinatorial optimization (CO) problems. Here we present a gradient descent algorithm which exploits the special structure of some LP relaxations induced by CO…

Optimization and Control · Mathematics 2020-11-17 Alexey Antonov

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

A Variant of Concurrent Constraint Programming on GPU

The number of cores on graphical computing units (GPUs) is reaching thousands nowadays, whereas the clock speed of processors stagnates. Unfortunately, constraint programming solvers do not take advantage yet of GPU parallelism. One reason…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-26 Pierre Talbot , Frédéric Pinel , Pascal Bouvry

A Concurrent Language with a Uniform Treatment of Regions and Locks

A challenge for programming language research is to design and implement multi-threaded low-level languages providing static guarantees for memory safety and freedom from data races. Towards this goal, we present a concurrent language…

Programming Languages · Computer Science 2010-02-05 Prodromos Gerakios , Nikolaos Papaspyrou , Konstantinos Sagonas

Development and performance of a HemeLB GPU code for human-scale blood flow simulation

In recent years, it has become increasingly common for high performance computers (HPC) to possess some level of heterogeneous architecture - typically in the form of GPU accelerators. In some machines these are isolated within a dedicated…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-19 I. Zacharoudiou , J. W. S. McCullough , P. V. Coveney

GSGP-CUDA -- a CUDA framework for Geometric Semantic Genetic Programming

Geometric Semantic Genetic Programming (GSGP) is a state-of-the-art machine learning method based on evolutionary computation. GSGP performs search operations directly at the level of program semantics, which can be done more efficiently…

Neural and Evolutionary Computing · Computer Science 2021-06-09 Leonardo Trujillo , Jose Manuel Muñoz Contreras , Daniel E Hernandez , Mauro Castelli , Juan J Tapia

Intra-node Memory Safe GPU Co-Scheduling

GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-14 Carlos Reano , Federico Silla , Dimitrios S. Nikolopoulos , Blesson Varghese

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace driver. As a result, the translation from…

Performance · Computer Science 2026-04-30 Yuang Yan , Ian Karlin , Ryan Grant

Lightning: Scaling the GPU Programming Model Beyond a Single GPU

The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-03 Stijn Heldens , Pieter Hijma , Ben van Werkhoven , Jason Maassen , Rob. V. van Nieuwpoort

Source-to-Source Transformations for GPU Code Generation

GPUs have become essential in modern high performance computing, but programming them correctly remains a significant challenge. This difficulty arises from subtle concurrency bugs that result from the explicit management of synchronization…

Programming Languages · Computer Science 2026-05-15 Julien de Castelnau , Thomas Koehler , Arthur Charguéraud , Clément Pit-Claudel

Guardian: Safe GPU Sharing in Multi-Tenant Environments

Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs across multiple applications from different tenants can improve resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-17 Manos Pavlidakis , Giorgos Vasiliadis , Stelios Mavridis , Anargyros Argyros , Antony Chazapis , Angelos Bilas