Related papers: A Graph-based Model for GPU Caching Problems

A Graph-Partition-Based Scheduling Policy for Heterogeneous Architectures

In order to improve system performance efficiently, a number of systems choose to equip multi-core and many-core processors (such as GPUs). Due to their discrete memory these heterogeneous architectures comprise a distributed system within…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-02-27 Hao Wu , Daniel Lohmann , Wolfgang Schröder-Preikschat

Contention-Aware GPU Partitioning and Task-to-Partition Allocation for Real-Time Workloads

In order to satisfy timing constraints, modern real-time applications require massively parallel accelerators such as General Purpose Graphic Processing Units (GPGPUs). Generation after generation, the number of computing clusters made…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Houssam-Eddine Zahaf , Ignacio Sanudo Olmedo , Jayati Singh , Nicola Capodieci , Sebastien Faucou

GPU-Accelerated Algorithms for Process Mapping

Process mapping asks to assign vertices of a task graph to processing elements of a supercomputer such that the computational workload is balanced while the communication cost is minimized. Motivated by the recent success of GPU-based graph…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-16 Petr Samoldekin , Christian Schulz , Henning Woydt

Dynamic Load Balancing Strategies for Graph Applications on GPUs

Acceleration of graph applications on GPUs has found large interest due to the ubiquitous use of graph processing in various domains. The inherent \textit{irregularity} in graph applications leads to several challenges for parallelization.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-02 Ananya Raval , Rupesh Nasre , Vivek Kumar , Vasudevan R , Sathish Vadhiyar , Keshav Pingali

High-Quality Shared-Memory Graph Partitioning

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…

Data Structures and Algorithms · Computer Science 2018-10-16 Yaroslav Akhremtsev , Peter Sanders , Christian Schulz

Scalable Graph Embedding LearningOn A Single GPU

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…

Machine Learning · Computer Science 2022-01-21 Azita Nouri , Philip E. Davis , Pradeep Subedi , Manish Parashar

GPUSCAN$^{++}$:Efficient Structural Graph Clustering on GPUs

Structural clustering is one of the most popular graph clustering methods, which has achieved great performance improvement by utilizing GPUs. Even though, the state-of-the-art GPU-based structural clustering algorithm, GPUSCAN, still…

Databases · Computer Science 2023-12-01 Long Yuan , Zeyu Zhou , Xuemin Lin , Zi Chen , Xiang Zhao , Fan Zhang

Graph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning

Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-19 Mu Li , Dave G. Andersen , Alexander J. Smola

Efficiently Scheduling Parallel DAG Tasks on Identical Multiprocessors

Parallel real-time embedded applications can be modelled as directed acyclic graphs (DAGs) whose nodes model subtasks and whose edges model precedence constraints among subtasks. Efficiently scheduling such parallel tasks can be challenging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-24 Shardul Lendve , Konstantinos Bletsas , Pedro F. Souto

Modeling Task Mapping for Data-intensive Applications in Heterogeneous Systems

We introduce a new model for the task mapping problem to aid in the systematic design of algorithms for heterogeneous systems including, but not limited to, CPUs, GPUs and FPGAs. A special focus is set on the communication between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Martin Wilhelm , Hanna Geppert , Anna Drewes , Thilo Pionteck

CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning

Graph-structured data is ubiquitous in the real world, and Graph Neural Networks (GNNs) have become increasingly popular in various fields due to their ability to process such irregular data directly. However, as data scale, GNNs become…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-10 Xianfeng Song , Yi Zou , Zheng Shi

Scalable Graph Convolutional Network Training on Distributed-Memory Systems

Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs. The large data sizes of graphs and their vertex features make scalable training algorithms and distributed memory systems necessary. Since the…

Machine Learning · Computer Science 2022-12-14 Gunduz Vehbi Demirci , Aparajita Haldar , Hakan Ferhatosmanoglu

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

Power Consumption Analysis of Parallel Algorithms on GPUs

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-08 Abdullah Gharaibeh , Tahsin Reza , Elizeu Santos-Neto , Lauro Beltrao Costa , Scott Sallinen , Matei Ripeanu

A Feasible Graph Partition Framework for Random Walks Implemented by Parallel Computing in Big Graph

Graph partition is a fundamental problem of parallel computing for big graph data. Many graph partition algorithms have been proposed to solve the problem in various applications, such as matrix computations and PageRank, etc., but none has…

Social and Information Networks · Computer Science 2015-01-05 Xiaoming Liu , Yadong Zhou , Xiaohong Guan

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

Thread Batching for High-performance Energy-efficient GPU Memory Design

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and…

Hardware Architecture · Computer Science 2019-06-17 Bing Li , Mengjie Mao , Xiaoxiao Liu , Tao Liu , Zihao Liu , Wujie Wen , Yiran Chen , Hai , Li

Faster Vertex Cover Algorithms on GPUs with Component-Aware Parallel Branching

Algorithms for finding minimum or bounded vertex covers in graphs use a branch-and-reduce strategy, which involves exploring a highly imbalanced search tree. Prior GPU solutions assign different thread blocks to different sub-trees, while…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-29 Hussein Amro , Basel Fakhri , Amer E. Mouawad , Izzat El Hajj

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-08 Eishi Arima , Minjoon Kang , Issa Saba , Josef Weidendorfer , Carsten Trinitis , Martin Schulz