Related papers: A distributed-memory package for dense Hierarchica…
We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which…
Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices…
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it…
We present a unification and generalization of what is known in the literature as sequentially and hierarchically semi-separable (SSS and HSS) representations for matrices. Describing rank-structured representations of (inverses of) sparse…
While quantum algorithms for solving large scale systems of linear equations offer potentially exponential speedups, their application has largely been confined to sparse matrices. This work extends the scope of these algorithms to a broad…
Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce…
Modern large language models (LLMs) place extraordinary pressure on memory and compute budgets, making principled compression indispensable for both deployment and continued training. We present Hierarchical Sparse Plus Low-Rank (HSS)…
A randomized algorithm for computing a data sparse representation of a given rank structured matrix $A$ (a.k.a. an $H$-matrix) is presented. The algorithm draws on the randomized singular value decomposition (RSVD), and operates under the…
The parallel algorithm for loading large sparse matrices from files into distributed memories of high performance computing (HPC) systems is presented. This algorithm was designed specially for matrices stored in files in the space-effcient…
Hierarchical matrices are space and time efficient representations of dense matrices that exploit the low rank structure of matrix blocks at different levels of granularity. The hierarchically low rank block partitioning produces…
The hierarchical matrix framework partitions matrices into subblocks that are either small or of low numerical rank, enabling linear storage complexity and efficient matrix-vector multiplication. This work focuses on the $H^2$-matrix format…
A randomized algorithm for computing a compressed representation of a given rank-structured matrix $A \in \mathbb{R}^{N\times N}$ is presented. The algorithm interacts with $A$ only through its action on vectors. Specifically, it draws two…
We present a fast algorithm for linear least squares problems governed by hierarchically block separable (HBS) matrices. Such matrices are generally dense but data-sparse and can describe many important operators including those derived…
Identifying similar protein sequences is a core step in many computational biology pipelines such as detection of homologous protein sequences, generation of similarity protein graphs for downstream analysis, functional annotation and gene…
In this paper, an efficient divide-and-conquer (DC) algorithm is proposed for the symmetric tridiagonal matrices based on ScaLAPACK and the hierarchically semiseparable (HSS) matrices. HSS is an important type of rank-structured…
This paper presents an efficient method to perform Structured Matrix Approximation by Separation and Hierarchy (SMASH), when the original dense matrix is associated with a kernel function. Given points in a domain, a tree structure is first…
Generalized sparse matrix-matrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an…
The parallel linear equations solver capable of effectively using 1000+ processors becomes the bottleneck of large-scale implicit engineering simulations. In this paper, we present a new hierarchical parallel master-slave-structural…
This paper describes the adaptation of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression to a distributed computational setting. Additionally, we extend the algorithm to efficiently compute…
When solving partial differential equations (PDEs) using finite difference or finite element methods, efficient solvers are required for handling large sparse linear systems. In this paper, a recursive sparse LU decomposition for matrices…