English
Related papers

Related papers: GPU Load Balancing

200 papers

We propose a GPU fine-grained load-balancing abstraction that decouples load balancing from work processing and aims to support both static and dynamic schedules with a programmable interface to implement new load-balancing schedules. Prior…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Muhammad Osama , Serban D. Porumbescu , John D. Owens

We introduce Stream-K, a work-centric parallelization of matrix multiplication (GEMM) and related computations in dense linear algebra. Whereas contemporary decompositions are primarily tile-based, our method operates by partitioning an…

Data Structures and Algorithms · Computer Science 2023-01-11 Muhammad Osama , Duane Merrill , Cris Cecka , Michael Garland , John D. Owens

Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-28 Vishwesh Jatala , Loc Hoang , Roshan Dathathri , Gurbinder Gill , V Krishna Nandivada , Keshav Pingali

General matrix multiplication (GEMM) operations are the fundamental building blocks of computational domains including artificial intelligence (AI). As GPU architectures evolve and high-performance AI becomes increasingly important,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-26 Harisankar Sadasivan , Muhammed Emin Ozturk , Muhammad Osama , Chris Millette , Astha Rai , Maksim Podkorytov , John Afaganis , Carlus Huang , Jing Zhang , Jun Liu

Acceleration of graph applications on GPUs has found large interest due to the ubiquitous use of graph processing in various domains. The inherent \textit{irregularity} in graph applications leads to several challenges for parallelization.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-02 Ananya Raval , Rupesh Nasre , Vivek Kumar , Vasudevan R , Sathish Vadhiyar , Keshav Pingali

Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can suffer from memory oversubscription if…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-05 Michael E. Rowan , Axel Huebl , Kevin N. Gott , Jack Deslippe , Maxence Thévenet , Remi Lehe , Jean-Luc Vay

In order to satisfy timing constraints, modern real-time applications require massively parallel accelerators such as General Purpose Graphic Processing Units (GPGPUs). Generation after generation, the number of computing clusters made…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Houssam-Eddine Zahaf , Ignacio Sanudo Olmedo , Jayati Singh , Nicola Capodieci , Sebastien Faucou

3D Gaussian Splatting (3DGS) is increasingly attracting attention in both academia and industry owing to its superior visual quality and rendering speed. However, training a 3DGS model remains a time-intensive task, especially in load…

Computer Vision and Pattern Recognition · Computer Science 2025-05-09 Hao Gui , Lin Hu , Rui Chen , Mingxiao Huang , Yuxin Yin , Jin Yang , Yong Wu , Chen Liu , Zhongxu Sun , Xueyang Zhang , Kun Zhan

In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-18 Mark Blanco , Tze Meng Low , Kyungjoo Kim

The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance,…

Performance · Computer Science 2020-06-22 James D. Stevens , Andreas Klöckner

We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-15 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

Nowadays, the data to be processed by database systems has grown so large that any conventional, centralized technique is inadequate. At the same time, general purpose computation on GPU (GPGPU) recently has successfully drawn attention…

Databases · Computer Science 2013-09-04 Georgios Koutsoumpakis , Iakovos Koutsoumpakis , Anastasios Gounaris

Mixture-of-Experts (MoE) has emerged as a promising approach to scale up deep learning models due to its significant reduction in computational resources. However, the dynamic nature of MoE leads to load imbalance among experts, severely…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-16 Chenqi Zhao , Wenfei Wu , Linhai Song , Yuchen Xu , Yitao Yuan

General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-03 Shixun Wu , Yujia Zhai , Jinyang Liu , Jiajun Huang , Zizhe Jian , Bryan M. Wong , Zizhong Chen

We propose a GPU-based distributed optimization algorithm, aimed at controlling optimal power flow in multi-phase and unbalanced distribution systems. Typically, conventional distributed optimization algorithms employed in such scenarios…

Optimization and Control · Mathematics 2023-10-17 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

It has long been a problem to arrange and execute irregular workloads on massively parallel devices. We propose a general framework for statically batching irregular workloads into a single kernel with a runtime task mapping mechanism on…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-28 Yinghan Li , Yifei Li , Jiejing Zhang , Bujiao Chen , Xiaotong Chen , Lian Duan , Yejun Jin , Zheng Li , Xuanyu Liu , Haoyu Wang , Wente Wang , Yajie Wang , Jiacheng Yang , Peiyang Zhang , Laiwen Zheng , Wenyuan Yu

GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism - such as flat or two-level parallelism - and a degree of parallelism that can be statically determined based on the size of the input dataset.…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-18 Hancheng Wu , Da Li , Michela Becchi

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency. Our study employs two approaches: a custom-implemented tiled matrix…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-27 Xiaoteng , Liu , Pavly Halim

Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-30 Kshiteej Mahajan , Arjun Balasubramanian , Arjun Singhvi , Shivaram Venkataraman , Aditya Akella , Amar Phanishayee , Shuchi Chawla
‹ Prev 1 2 3 10 Next ›