Related papers: RAFI -- A Ray/Work Forwarding Infrastructure for D…

GigaAPI for GPU Parallelization

GigaAPI is a user-space API that simplifies multi-GPU programming, bridging the gap between the capabilities of parallel GPU systems and the ability of developers to harness their full potential. The API offers a comprehensive set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-03 M. Suvarna , O. Tehrani

GPGPU Based Parallelized Client-Server Framework for Providing High Performance Computation Support

Parallel data processing has become indispensable for processing applications involving huge data sets. This brings into focus the Graphics Processing Units (GPUs) which emphasize on many-core computing. With the advent of General Purpose…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-22 Poorna Banerjee , Amit Dave

New Parallel computing framework for radiation transport codes

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of…

Accelerator Physics · Physics 2012-02-13 M. A. Kostin , N. V. Mokhov , K. Niita

HDArray: Parallel Array Interface for Distributed Heterogeneous Devices

Heterogeneous clusters with nodes containing one or more accelerators, such as GPUs, have become common. While MPI provides inter-address space communication, and OpenCL provides a process with access to heterogeneous computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-19 Hyun Dok Cho , Okwan Kwon , Samuel P. Midkiff

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware. This shift in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-09 Polykarpos Thomadakis , Nikos Chrisochoides

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

For a deep learning model, efficient execution of its computation graph is key to achieving high performance. Previous work has focused on improving the performance for individual nodes of the computation graph, while ignoring the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-26 Linpeng Tang , Yida Wang , Theodore L. Willke , Kai Li

GPGPU Processing in CUDA Architecture

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-02-21 Jayshree Ghorpade , Jitendra Parande , Madhura Kulkarni , Amit Bawaskar

TripleID-Q: RDF Query Processing Framework using GPU

Resource Description Framework (RDF) data represents information linkage around the Internet. It uses Inter- nationalized Resources Identifier (IRI) which can be referred to external information. Typically, an RDF data is serialized as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-05 Chantana Chantrapornchai , Chidchanok Choksuchat

PyFAI: a Python library for high performance azimuthal integration on GPU

The pyFAI package has been designed to reduce X-ray diffraction images into powder diffraction curves to be further processed by scientists. This contribution describes how to convert an image into a radial profile using the Numpy package,…

Instrumentation and Methods for Astrophysics · Physics 2014-12-22 Jérôme Kieffer , Giannis Ashiotis

Improving the performance of the linear systems solvers using CUDA

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

PAGANI: A Parallel Adaptive GPU Algorithm for Numerical

We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-24 Ioannis Sakiotis , Kamesh Arumugam , Marc Paterno , Desh Ranjan , Balša Terzić , Mohammad Zubair

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime

GPUs are readily available in cloud computing and personal devices, but their use for data processing acceleration has been slowed down by their limited integration with common programming languages such as Python or Java. Moreover, using…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-20 Alberto Parravicini , Arnaud Delamare , Marco Arnaboldi , Marco D. Santambrogio

MPI-rical: Data-Driven MPI Distributed Parallelism Assistance with Transformers

Message Passing Interface (MPI) plays a crucial role in distributed memory parallelization across multiple nodes. However, parallelizing MPI code manually, and specifically, performing domain decomposition, is a challenging, error-prone…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-31 Nadav Schneider , Tal Kadosh , Niranjan Hasabnis , Timothy Mattson , Yuval Pinter , Gal Oren

Massively Parallel Ray Tracing Algorithm Using GPU

Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation…

Graphics · Computer Science 2015-04-14 Yutong Qin , Jianbiao Lin , Xiang Huang

Data Parallel Path Tracing in Object Space

We investigate the concept of rendering production-style content with full path tracing in a data-distributed fashion -- that is, with multiple collaborating nodes and/or GPUs that each store only part of the model. In particular, we…

Graphics · Computer Science 2022-04-22 Ingo Wald , Steven G Parker

PaPy: Parallel and Distributed Data-processing Pipelines in Python

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…

Programming Languages · Computer Science 2014-07-17 Marcin Cieslik , Cameron Mura

MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming

The hybrid MPI+X programming paradigm, where X refers to threads or GPUs, has gained prominence in the high-performance computing arena. This corresponds to a trend of system architectures growing more heterogeneous. The current MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-31 Hui Zhou , Ken Raffenetti , Yanfei Guo , Rajeev Thakur

CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications

We propose a programming technology that bridges cross-platform compatibility and hardware acceleration in ray tracing applications. Our methodology enables developers to define algorithms while our translator manages implementation…

Graphics · Computer Science 2024-09-20 Vladimir Frolov , Vadim Sanzharov , Garifullin Albert , Maxim Raenchuk , Alexei Voloboy

CuPBoP: CUDA for Parallelized and Broad-range Processors

CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-17 Ruobing Han , Jun Chen , Bhanu Garg , Jeffrey Young , Jaewoong Sim , Hyesoon Kim

RTGPU: Real-Time GPU Scheduling of Hard Deadline Parallel Tasks with Fine-Grain Utilization

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang