Related papers: PGAbB: A Block-Based Graph Processing Framework fo…
High-performance implementations of graph algorithms are challenging to implement on new parallel hardware such as GPUs because of three challenges: (1) the difficulty of coming up with graph building blocks, (2) load imbalance on parallel…
GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and…
Not only with the large host memory for supporting large scale graph processing, GPU-accelerated heterogeneous architecture can also provide a great potential for high-performance computing. However, few existing heterogeneous systems can…
Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a…
Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic.…
Extensive prior research has focused on alleviating the characteristic poor cache locality of graph analytics workloads. However, graph pre-processing tasks remain relatively unexplored. In many important scenarios, graph pre-processing…
There is growing interest in graph pattern mining (GPM) problems such as motif counting. GPM systems have been developed to provide unified interfaces for programming algorithms for these problems and for running them on parallel systems.…
Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can…
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…
Graph partitioning, a well studied problem of parallel computing has many applications in diversified fields such as distributed computing, social network analysis, data mining and many other domains. In this paper, we introduce FGPGA, an…
To enable heterogeneous computing systems with autonomous programming and optimization capabilities, we propose a unified, end-to-end, programmable graph representation learning (PGL) framework that is capable of mining the complexity of…
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However,…
Graph foundation models have demonstrated remarkable adaptability across diverse downstream tasks through large-scale pretraining on graphs. However, existing implementations of the backbone model, graph transformers, are typically limited…
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…
In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…
The Bulk Synchronous Parallel(BSP) computational model has emerged as the dominant distributed framework to build large-scale iterative graph processing systems. While its implementations(e.g., Pregel, Giraph, and Hama) achieve high…
Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both hardware and software. Even though…
We propose a new graph-theoretic benchmark in this paper. The benchmark is developed to address shortcomings of an existing widely-used graph benchmark. We thoroughly studied a large number of traditional and contemporary graph algorithms…