Related papers: PSelInv -- A Distributed Memory Parallel Algorithm…

PSelInv - A Distributed Memory Parallel Algorithm for Selected Inversion: the non-symmetric Case

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non- symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are…

Mathematical Software · Computer Science 2017-08-16 Mathias Jacquelin , Lin Lin , Chao Yang

Enhancing the scalability and load balancing of the parallel selected inversion algorithm via tree-based asynchronous communication

We develop a method for improving the parallel scalability of the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-04-21 Mathias Jacquelin , Lin Lin , Nathan Wichmann , Chao Yang

A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems

Given a sparse matrix $A$, the selected inversion algorithm is an efficient method for computing certain selected elements of $A^{-1}$. These selected elements correspond to all or some nonzero elements of the LU factors of $A$. In many…

Mathematical Software · Computer Science 2016-04-12 Mathias Jacquelin , Lin Lin , Weile Jia , Yonghua Zhao , Chao Yang

GPU-Accelerated Parallel Selected Inversion for Structured Matrices Using sTiles

Selected inversion is essential for applications such as Bayesian inference, electronic structure calculations, and inverse covariance estimation, where computing only specific elements of large sparse matrix inverses significantly reduces…

Performance · Computer Science 2025-09-03 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

A distributed-memory hierarchical solver for general sparse linear systems

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…

Numerical Analysis · Mathematics 2017-12-21 Chao Chen , Hadi Pouransari , Sivasankaran Rajamanickam , Erik G. Boman , Eric Darve

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

We design and develop a work-efficient multithreaded algorithm for sparse matrix-sparse vector multiplication (SpMSpV) where the matrix, the input vector, and the output vector are all sparse. SpMSpV is an important primitive in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-26 Ariful Azad , Aydin Buluc

Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments

Generalized sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. Here we show that SpGEMM also yields efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-19 Aydin Buluc , John Gilbert

Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication

We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, called TS-SpGEMM, has important applications in multi-source breadth-first search,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-08-23 Isuru Ranawaka , Md Taufique Hussain , Charles Block , Gerasimos Gerogiannis , Josep Torrellas , Ariful Azad

A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-05-29 Demetrios Coutinho , Felipe O. Lins e Silva , Daniel Aloise , Samuel , Xavier-de-Souza

Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured Matrices

The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructured sparse matrices, which are more…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Kobe Bergmans , Karl Meerbergen , Raf Vandebril

Javelin: A Scalable Implementation for Sparse Incomplete LU Factorization

In this work, we present a new scalable incomplete LU factorization framework called Javelin to be used as a preconditioner for solving sparse linear systems with iterative methods. Javelin allows for improved parallel factorization on…

Mathematical Software · Computer Science 2019-05-06 Joshua Dennis Booth , Gregory Bolet

Semi-External Memory Sparse Matrix Multiplication for Billion-Node Graphs

Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-15 Da Zheng , Disa Mhembere , Vince Lyzinski , Joshua Vogelstein , Carey E. Priebe , Randal Burns

Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment

We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies the local nonzeros to be sent, exchanges…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-26 Elaheh Hassani , Md Taufique Hussain , Ariful Azad

Serinv: A Scalable Library for the Selected Inversion of Block-Tridiagonal with Arrowhead Matrices

The inversion of structured sparse matrices is a key but computationally and memory-intensive operation in many scientific applications. There are cases, however, where only particular entries of the full inverse are required. This has…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-25 Vincent Maillou , Lisa Gaedke-Merzhaeuser , Alexandros Nikolaos Ziogas , Olaf Schenk , Mathieu Luisier

Parallel Algorithms for Tensor Train Arithmetic

We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms…

Numerical Analysis · Mathematics 2021-09-08 Hussam Al Daas , Grey Ballard , Peter Benner

A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

We present the submatrix method, a highly parallelizable method for the approximate calculation of inverse p-th roots of large sparse symmetric matrices which are required in different scientific applications. We follow the idea of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-06 Michael Lass , Stephan Mohr , Hendrik Wiebeler , Thomas D. Kühne , Christian Plessl

Accelerated Parallel and Distributed Algorithm using Limited Internal Memory for Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues: fast algorithms,…

Optimization and Control · Mathematics 2015-07-01 Duy-Khuong Nguyen , Tu-Bao Ho

Highly Parallel Sparse Matrix-Matrix Multiplication

Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Aydın Buluç , John R. Gilbert

Scalable hierarchical parallel algorithm for the solution of super large-scale sparse linear equations

The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural…

Computational Physics · Physics 2015-06-11 Ran Xu , Bin Liu , Yuan Dong

Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the…

Programming Languages · Computer Science 2026-04-23 Atharva Chougule , Alexander J Root , Rubens Lacouture , Bobby Yan , Rohan Yadav , Fredrik Kjolstad