English
Related papers

Related papers: Toward the Graphics Turing Scale on a Blue Gene Su…

200 papers

The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores,…

High Energy Physics - Lattice · Physics 2007-05-23 Gyan Bhanot , Dong Chen , Alan Gara , Pavlos Vranas

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the…

Performance · Computer Science 2015-10-19 Tareq M. Malas , Aron J. Ahmadia , Jed Brown , John A. Gunnels , David E. Keyes

Genetic Programming (GP) is a computationally intensive technique which also has a high degree of natural parallelism. Parallel computing architectures have become commonplace especially with regards Graphics Processing Units (GPU). Hence,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-05 Darren M. Chitty

In this paper we present scaling results of a FFT library, FFTK, and a pseudospectral code, Tarang, on grid resolutions up to $8192^3$ grid using 65536 cores of Blue Gene/P and 196608 cores of Cray XC40 supercomputers. We observe that…

Computational Physics · Physics 2018-05-22 Anando G. Chatterjee , Mahendra K. Verma , Abhishek Kumar , Ravi Samtaney , Bilel Hadri , Rooh Khurram

General Purpose computing on Graphical Processing Units (GPGPU) has resulted in unprecedented levels of speedup over its CPU counterparts, allowing programmers to harness the computational power of GPU shader cores to accelerate other…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-20 Vani Nagarajan , Milind Kulkarni

Data management on GPUs has become increasingly relevant due to a tremendous rise in processing power and available GPU memory. Similar to main-memory systems, there is a need for performant GPU-resident index structures to speed up query…

Databases · Computer Science 2023-09-28 Justus Henneberg , Felix Schuhknecht

Our work addresses the enabling of the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications, on large-scale systems. This approach allows new-and potentially…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-08-27 Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster

Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation…

Graphics · Computer Science 2015-04-14 Yutong Qin , Jianbiao Lin , Xiang Huang

We study ray reordering as a tool for increasing the performance of existing GPU ray tracing implementations. We focus on ray reordering that is fully agnostic to the particular trace kernel. We summarize the existing methods for computing…

Graphics · Computer Science 2025-06-16 Daniel Meister , Jakub Bokšanský , Michael Guthe , Jiří Bittner

We have extended the Falkon lightweight task execution framework to make loosely coupled programming on petascale systems a practical and useful programming model. This work studies and measures the performance factors involved in applying…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-17 Ioan Raicu , Zhao Zhang , Mike Wilde , Ian Foster , Pete Beckman , Kamil Iskra , Ben Clifford

Genetic Programming (GP) is a computationally intensive technique which is naturally parallel in nature. Consequently, many attempts have been made to improve its run-time from exploiting highly parallel hardware such as GPUs. However, a…

Neural and Evolutionary Computing · Computer Science 2018-09-21 Darren M. Chitty

Beagle is a new software framework that enables execution of Genetic Programming tasks on the GPU. Currently available for symbolic regression, it processes individuals of the population and fitness cases for training in a way that…

Neural and Evolutionary Computing · Computer Science 2026-03-16 Nathan Haut , Ilya Basin , Marzieh Kianinejad , Ruchika Gupta , Elijah Smith , Zachary Perrico , Wolfgang Banzhaf

Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or…

Performance · Computer Science 2019-11-26 Abdul Dakkak , Cheng Li , Isaac Gelado , Jinjun Xiong , Wen-mei Hwu

Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train…

We present a high-performance, graphics processing unit (GPU)-based framework for the efficient analysis and visualization of (nearly) terabyte (TB)-sized 3-dimensional images. Using a cluster of 96 GPUs, we demonstrate for a 0.5 TB image:…

Instrumentation and Methods for Astrophysics · Physics 2015-06-12 A. H. Hassan , C. J. Fluke , D. G. Barnes , V. A. Kilborn

Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable,…

Numerical Analysis · Computer Science 2012-12-24 Xintian Yang , Srinivasan Parthasarathy , Ponnuswamy Sadayappan

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-16 Chittampally Vasanth Raja , Srinivas Balasubramanian , Prakash S Raghavendra

Over the last ten years, graphics processors have become the de facto accelerator for data-parallel tasks in various branches of high-performance computing, including machine learning and computational sciences. However, with the recent…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-28 Johannes Pekkilä , Oskar Lappi , Fredrik Robertsén , Maarit J. Korpi-Lagg

We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core…

Instrumentation and Methods for Astrophysics · Physics 2015-05-27 Satoshi Tanaka , Kohji Yoshikawa , Takashi Okamoto , Kenji Hasegawa

Gravitational lensing calculation using a direct inverse ray-shooting approach is a computationally expensive way to determine magnification maps, caustic patterns, and light-curves (e.g. as a function of source profile and size). However,…

Instrumentation and Methods for Astrophysics · Physics 2009-09-28 Alexander C. Thompson , Christopher J. Fluke , David G. Barnes , Benjamin R. Barsdell
‹ Prev 1 2 3 10 Next ›