Related papers: The Reverse Cuthill-McKee Algorithm in Distributed…
The Reverse Cuthill-McKee (RCM) algorithm is a graph-based method for reordering sparse matrices, renowned for its effectiveness in minimizing matrix bandwidth and profile. This reordering enhances the efficiency of matrix operations,…
Sparse matrices, as prevalent primitive of various scientific computing algorithms, persist as a bottleneck in processing. A skew-symmetric matrix flips signs of symmetric pairs in a symmetric matrix. Our work, Parallel 3-Way Banded…
Existing block Kaczmarz methods face challenges in balancing computational efficiency and convergence for large sparse linear systems with scattered nonzero patterns, due to costly partitioning strategies and non-orthogonal projections. In…
We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search,…
Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds with a combinatorial structure. As…
This paper describes the adaptation of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression to a distributed computational setting. Additionally, we extend the algorithm to efficiently compute…
We present shared-memory parallel methods for Maximal Clique Enumeration (MCE) from a graph. MCE is a fundamental and well-studied graph analytics task, and is a widely used primitive for identifying dense structures in a graph. Due to its…
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation in processing graphs. Recently, size, variety, and structural complexity of these networks has grown dramatically.…
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…
Multiplying two sparse matrices (SpGEMM) is a common computational primitive used in many areas including graph algorithms, bioinformatics, algebraic multigrid solvers, and randomized sketching. Distributed-memory parallel algorithms for…
Maximal Clique Enumeration (MCE) is a fundamental graph mining problem, and is useful as a primitive in identifying dense structures in a graph. Due to the high computational cost of MCE, parallel methods are imperative for dealing with…
Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. While shared-memory parallel partitioners have recently matured to…
Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient…
This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our…
Counting triangles in a graph and incident to each vertex is a fundamental and frequently considered task of graph analysis. We consider how to efficiently do this for huge graphs using massively parallel distributed-memory machines.…
We suggest a technique to reduce the storage size of sparse matrices at no loss of information. We call this technique Diagonally-Adressed (DA) storage. It exploits the typically low matrix bandwidth of matrices arising in applications. For…
The approximate minimum degree algorithm is widely used before numerical factorization to reduce fill-in for sparse matrices. While considerable attention has been given to the numerical factorization process, less focus has been placed on…
In this paper, we investigate the parallelization of $k$-core decomposition, a method used in graph analysis to identify cohesive substructures and assess node centrality. Although efficient sequential algorithms exist for this task, the…
In the big data era, graph computing is widely used to exploit the hidden value in real-world graphs in various scenarios such as social networks, knowledge graphs, web searching, and recommendation systems. However, the random memory…
Even distribution of irregular workload to processing units is crucial for efficient parallelization in many applications. In this work, we are concerned with a spatial partitioning called rectilinear partitioning (also known as generalized…