Related papers: PGAbB: A Block-Based Graph Processing Framework fo…

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-16 Carl Yang , Aydin Buluc , John D. Owens

From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs

GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and…

Machine Learning · Computer Science 2026-05-22 Jiachang Liu , Andrea Lodi

Scaling Up Large-Scale Graph Processing for GPU-Accelerated Heterogeneous Systems

Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Xianliang Li

GraphCage: Cache Aware Graph Processing on GPUs

Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-05 Xuhao Chen

GraVF-M: Graph Processing System Generation for Multi-FPGA Platforms

Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Nina Engelhardt , Hayden K. -H. So

Optimizing Graph Processing and Preprocessing with Hardware Assisted Propagation Blocking

Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing…

Hardware Architecture · Computer Science 2020-11-18 Vignesh Balaji , Brandon Lucia

Pangolin: An Efficient and Flexible Graph Pattern Mining System on CPU and GPU

There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-22 Xuhao Chen , Roshan Dathathri , Gurbinder Gill , Keshav Pingali

Scalable Graph Embedding LearningOn A Single GPU

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…

Machine Learning · Computer Science 2022-01-21 Azita Nouri , Philip E. Davis , Pradeep Subedi , Manish Parashar

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs

Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…

Hardware Architecture · Computer Science 2022-06-20 Jonas Dann , Daniel Ritter , Holger Fröning

FGPGA: An Efficient Genetic Approach for Producing Feasible Graph Partitions

Graph partitioning, a well studied problem of parallel computing has many applications in diversified fields such as distributed computing, social network analysis, data mining and many other domains. In this paper, we introduce FGPGA, an…

Neural and Evolutionary Computing · Computer Science 2014-11-18 Md. Lisul Islam , Novia Nurain , Swakkhar Shatabda , M Sohel Rahman

End-to-end Mapping in Heterogeneous Systems Using Graph Representation Learning

To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of…

Machine Learning · Computer Science 2022-04-27 Yao Xiao , Guixiang Ma , Nesreen K. Ahmed , Mihai Capota , Theodore Willke , Shahin Nazarian , Paul Bogdan

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-12-08 Abdullah Gharaibeh , Tahsin Reza , Elizeu Santos-Neto , Lauro Beltrao Costa , Scott Sallinen , Matei Ripeanu

Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs

Graph foundation models have demonstrated remarkable adaptability across diverse downstream tasks through large-scale pretraining on graphs. However, existing implementations of the backbone model, graph transformers, are typically limited…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jun-Liang Lin , Kamesh Madduri , Mahmut Taylan Kandemir

GraphLab: A New Framework for Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2010-06-28 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , Joseph M. Hellerstein

GraphLab: A New Framework For Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2014-08-12 Yucheng Low , Joseph E. Gonzalez , Aapo Kyrola , Danny Bickson , Carlos E. Guestrin , Joseph Hellerstein

Exploring the Limits of GPUs With Parallel Graph Algorithms

In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-25 Frank Dehne , Kumanan Yogaratnam

GraphHP: A Hybrid Platform for Iterative Graph Processing

The Bulk Synchronous Parallel(BSP) computational model has emerged as the dominant distributed framework to build large-scale iterative graph processing systems. While its implementations(e.g., Pregel, Giraph, and Hama) achieve high…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-23 Qun Chen , Song Bai , Zhanhuai Li , Zhiying Gou , Bo Suo , Wei Pan

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics

Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both hardware and software. Even though…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-13 Zeke Wang , Jie Zhang , Hongjing Huang , Yingtao Li , Xueying Zhu , Mo Sun , Zihan Yang , De Ma , Huajing Tang , Gang Pan , Fei Wu , Bingsheng He , Gustavo Alonso

A New Benchmark For Evaluation Of Graph-Theoretic Algorithms

We propose a new graph-theoretic benchmark in this paper. The benchmark is developed to address shortcomings of an existing widely-used graph benchmark. We thoroughly studied a large number of traditional and contemporary graph algorithms…

Performance · Computer Science 2010-05-06 Andy B. Yoo , Yang Liu , Sheila Vaidya , Stephen Poole