Related papers: Toward the Graphics Turing Scale on a Blue Gene Su…

The BlueGene/L Supercomputer

The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores,…

High Energy Physics - Lattice · Physics 2007-05-23 Gyan Bhanot , Dong Chen , Alan Gara , Pavlos Vranas

Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the…

Performance · Computer Science 2015-10-19 Tareq M. Malas , Aron J. Ahmadia , Jed Brown , John A. Gunnels , David E. Keyes

Faster GPU Based Genetic Programming Using A Two Dimensional Stack

Genetic Programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards Graphics Processing Units (GPU). Hence,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-05 Darren M. Chitty

Scaling of a Fast Fourier Transform and a Pseudo-spectral Fluid Solver up to 196608 cores

In this paper we present scaling results of a FFT library, FFTK, and a pseudospectral code, Tarang, on grid resolutions up to $8192^3$ grid using 65536 cores of Blue Gene/P and 196608 cores of Cray XC40 supercomputers. We observe that…

Computational Physics · Physics 2018-05-22 Anando G. Chatterjee , Mahendra K. Verma , Abhishek Kumar , Ravi Samtaney , Bilel Hadri , Rooh Khurram

RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware

General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-20 Vani Nagarajan , Milind Kulkarni

RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing

Data management on GPUs has become increasingly relevant due to a tremendous rise in processing power and available GPU memory. Similar to main-memory systems, there is a need for performant GPU-resident index structures to speed up query…

Databases · Computer Science 2023-09-28 Justus Henneberg , Felix Schuhknecht

Enabling Loosely-Coupled Serial Job Execution on the IBM BlueGene/P Supercomputer and the SiCortex SC5832

Our work addresses the enabling of the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications, on large-scale systems. This approach allows new-and potentially…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-08-27 Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster

Massively Parallel Ray Tracing Algorithm Using GPU

Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation…

Graphics · Computer Science 2015-04-14 Yutong Qin , Jianbiao Lin , Xiang Huang

On Ray Reordering Techniques for Faster GPU Ray Tracing

We study ray reordering as a tool for increasing the performance of existing GPU ray tracing implementations. We focus on ray reordering that is fully agnostic to the particular trace kernel. We summarize the existing methods for computing…

Graphics · Computer Science 2025-06-16 Daniel Meister , Jakub Bokšanský , Michael Guthe , Jiří Bittner

Towards Loosely-Coupled Programming on Petascale Systems

We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-17 Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster , Pete Beckman , Kamil Iskra , Ben Clifford

Exploiting Tournament Selection for Efficient Parallel Genetic Programming

Genetic Programming (GP) is a computationally intensive technique which is naturally parallel in nature. Consequently, many attempts have been made to improve its run-time from exploiting highly parallel hardware such as GPUs. However, a…

Neural and Evolutionary Computing · Computer Science 2018-09-21 Darren M. Chitty

GPU-Accelerated Genetic Programming for Symbolic Regression with Beagle Framework

Beagle is a new software framework that enables execution of Genetic Programming tasks on the GPU. Currently available for symbolic regression, it processes individuals of the population and fitness cases for training in a way that…

Neural and Evolutionary Computing · Computer Science 2026-03-16 Nathan Haut , Ilya Basin , Marzieh Kianinejad , Ruchika Gupta , Elijah Smith , Zachary Perrico , Wolfgang Banzhaf

Accelerating Reduction and Scan Using Tensor Core Units

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or…

Performance · Computer Science 2019-11-26 Abdul Dakkak , Cheng Li , Isaac Gelado , Jinjun Xiong , Wen-mei Hwu

Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-08 Sam Ade Jacobs , Brian Van Essen , David Hysom , Jae-Seung Yeom , Tim Moon , Rushil Anirudh , Jayaraman J. Thiagaranjan , Shusen Liu , Peer-Timo Bremer , Jim Gaffney , Tom Benson , Peter Robinson , Luc Peterson , Brian Spears

Tera-scale Astronomical Data Analysis and Visualization

We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image:…

Instrumentation and Methods for Astrophysics · Physics 2015-06-12 A. H. Hassan , C. J. Fluke , D. G. Barnes , V. A. Kilborn

Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-16 Chittampally Vasanth Raja , Srinivas Balasubramanian , Prakash S Raghavendra

Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures

We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core…

Instrumentation and Methods for Astrophysics · Physics 2015-05-27 Satoshi Tanaka , Kohji Yoshikawa , Takashi Okamoto , Kenji Hasegawa

Teraflop per second gravitational lensing ray-shooting using graphics processing units

Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However,…

Instrumentation and Methods for Astrophysics · Physics 2009-09-28 Alexander C. Thompson , Christopher J. Fluke , David G. Barnes , Benjamin R. Barsdell