English
Related papers

Related papers: Lightning: Scaling the GPU Programming Model Beyon…

200 papers

We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 Yuechao Pan , Yangzihao Wang , Yuduo Wu , Carl Yang , John D. Owens

GPUs are now used for a wide range of problems within HPC. However, making efficient use of the computational power available with multiple GPUs is challenging. The main challenges in achieving good performance are memory layout, affecting…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-20 Robert Clucas , Philip Blakely , Nikolaos Nikiforakis

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-30 Ruifan Chu , Anbang Wang , Xiuxiu Bai , Shuai Liu , Xiaoshe Dong

Choosing an appropriate programming paradigm for high-performance computing on low-power devices can be useful to speed up calculations. Many Android devices have an integrated GPU and - although not officially supported - the OpenCL…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-10 Robert Fritze , Claudia Plant

Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency and memory utilization. The MoE…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-20 Shiyi Cao , Shu Liu , Tyler Griggs , Peter Schafhalter , Xiaoxuan Liu , Ying Sheng , Joseph E. Gonzalez , Matei Zaharia , Ion Stoica

We introduce PennyLane's Lightning suite, a collection of high-performance state-vector simulators targeting CPU, GPU, and HPC-native architectures and workloads. Quantum applications such as QAOA, VQE, and synthetic workloads are…

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-24 Bogdan Oancea , Tudorel Andrei , Raluca Mariana Dragoescu

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…

Computational Physics · Physics 2010-07-22 Benjamin Block , Peter Virnau , Tobias Preis

The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-26 Joshua Romero , Mauro Bisson , Massimiliano Fatica , Massimo Bernaschi

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance,…

Performance · Computer Science 2020-06-22 James D. Stevens , Andreas Klöckner

Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs. This paper explores whether Large…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-08 Gregory Bolet , Giorgis Georgakoudis , Harshitha Menon , Konstantinos Parasyris , Niranjan Hasabnis , Hayden Estes , Kirk W. Cameron , Gal Oren

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and…

Machine Learning · Computer Science 2026-05-22 Jiachang Liu , Andrea Lodi

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-06 Yangzihao Wang , Yuechao Pan , Andrew Davidson , Yuduo Wu , Carl Yang , Leyuan Wang , Muhammad Osama , Chenshan Yuan , Weitang Liu , Andy T. Riffel , John D. Owens

GPU clouds have become a popular computing platform because of the cost of owning and maintaining high-performance computing clusters. Many cloud architectures have also been proposed to ensure a secure execution environment for guest…

Cryptography and Security · Computer Science 2021-12-08 Rihui Sun , Pefei Qiu , Yongqiang Lyu , Donsheng Wang , Jiang Dong , Gang Qu

The development of composable systems architecture marks a significant shift in resource allocation and utilization within data centers. This paper presents a composable architecture scaling up to 32 GPUs on a single node, addressing the…

Emerging Technologies · Computer Science 2024-04-10 John Ihnotic

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

In the quest for highest performance in scientific computing, we present a novel framework that relies on high-bandwidth communication between GPUs in a compute cluster. The framework offers linear scaling of performance for explicit…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-16 Martin Rose , Simon Homes , Lukas Ramsperger , Jose Gracia , Christoph Niethammer , Jadran Vrabec
‹ Prev 1 2 3 10 Next ›