Related papers: An Asynchronous Task-based Fan-Both Sparse Cholesk…
Cholesky factorization is a widely used method for solving linear systems involving symmetric, positive-definite matrices, and can be an attractive choice in applications where a high degree of numerical stability is needed. One such…
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block…
Direct factorization methods for the solution of large, sparse linear systems that arise from PDE discretizations are robust, but typically show poor time and memory scalability for large systems. In this paper, we describe an efficient…
The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a…
The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research…
Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular…
Efficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their…
We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing permutation runtime overhead. Rather than…
Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for…
In this paper, we consider two fundamental symmetric kernels in linear algebra: the Cholesky factorization and the symmetric rank-$k$ update (SYRK), with the classical three nested loops algorithms for these kernels. In addition, we…
We present three methods for distributed memory parallel inverse factorization of block-sparse Hermitian positive definite matrices. The three methods are a recursive variant of the AINV inverse Cholesky algorithm, iterative refinement, and…
Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns…
This paper explores the performance optimization of out-of-core (OOC) Cholesky factorization on shared-memory systems equipped with multiple GPUs. We employ fine-grained computational tasks to expose concurrency while creating opportunities…
This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…
We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which…
In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered…
As multicore systems continue to gain ground in the High Performance Computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these…
We present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use…
Cholesky linear solvers are a critical bottleneck in challenging applications within computer graphics and scientific computing. These applications include but are not limited to elastodynamic barrier methods such as Incremental Potential…
Even distribution of irregular workload to processing units is crucial for efficient parallelization in many applications. In this work, we are concerned with a spatial partitioning called rectilinear partitioning (also known as generalized…