Related papers: FastSV: A Distributed-Memory Connected Component A…
We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin's PRAM approach. We discuss multiple optimization techniques that reduce communication volume as…
Connected Components (CC) is a core graph problem with numerous applications. This paper investigates accelerating distributed CC by optimizing memory and network bandwidth utilization. We present two novel distributed CC algorithms,…
We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or…
Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where…
Training Graph Neural Networks(GNNs) on a large monolithic graph presents unique challenges as the graph cannot fit within a single machine and it cannot be decomposed into smaller disconnected components. Distributed sampling-based…
We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-performance graph neural networks. Our method…
Graph analytics are vital in fields such as social networks, biomedical research, and graph neural networks (GNNs). However, traditional CPUs and GPUs struggle with the memory bottlenecks caused by large graph datasets and their…
In this paper, we present an on-line fully dynamic algorithm for maintaining strongly connected component of a directed graph in a shared memory architecture. The edges and vertices are added or deleted concurrently by fixed number of…
Graph algorithms applied in many applications, including social networks, communication networks, VLSI design, graphics, and several others, require dynamic modifications -- addition and removal of vertices and/or edges -- in the graph.…
We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We…
Many problems such as node classification and link prediction in network data can be solved using graph embeddings. However, it is difficult to use graphs to capture non-binary relations such as communities of nodes. These kinds of complex…
This paper describes the adaptation of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression to a distributed computational setting. Additionally, we extend the algorithm to efficiently compute…
Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are…
Motivated by applications of distributed linear estimation, distributed control and distributed optimization, we consider the question of designing linear iterative algorithms for computing the average of numbers in a network. Specifically,…
Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. As a practical solution to train GNN on large graphs with billions of nodes and…
Dynamic connectivity is a fundamental dynamic graph problem, and recent algorithmic breakthroughs on dynamic graph sketching have reshaped what is theoretically possible: by encoding the graph as per-vertex linear sketches, these algorithms…
We design and implement an efficient parallel algorithm for finding a perfect matching in a weighted bipartite graph such that weights on the edges of the matching are large. This problem differs from the maximum weight matching problem,…
Graph partitioning schedules parallel calculations like sparse matrix-vector multiply (SpMV). We consider contiguous partitions, where the $m$ rows (or columns) of a sparse matrix with $N$ nonzeros are split into $K$ parts without…
Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…