Related papers: GPU Accelerated Sparse Cholesky Factorization
Direct factorization methods for the solution of large, sparse linear systems that arise from PDE discretizations are robust, but typically show poor time and memory scalability for large systems. In this paper, we describe an efficient…
In recent years, there has been widespread adoption of machine learning-based approaches to automate the solving of partial differential equations (PDEs). Among these approaches, Gaussian processes (GPs) and kernel methods have garnered…
We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing permutation runtime overhead. Rather than…
The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and…
Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular…
This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…
Cholesky factorization is a widely used method for solving linear systems involving symmetric, positive-definite matrices, and can be an attractive choice in applications where a high degree of numerical stability is needed. One such…
This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…
We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as…
This paper introduces an efficient and generic framework for finite-element simulations under an implicit time integration scheme. Being compatible with generic constitutive models, a fast matrix assembly method exploits the fact that…
Systems of linear equations arise at the heart of many scientific and engineering applications. Many of these linear systems are sparse; i.e., most of the elements in the coefficient matrix are zero. Direct methods based on matrix…
We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the…
Incomplete factorizations have long been popular general-purpose algebraic preconditioners for solving large sparse linear systems of equations. Guaranteeing the factorization is breakdown free while computing a high quality preconditioner…
Dense kernel matrices resulting from pairwise evaluations of a kernel function arise naturally in machine learning and statistics. Previous work in constructing sparse approximate inverse Cholesky factors of such matrices by minimizing…
Kernel methods represent some of the most popular machine learning tools for data analysis. Since exact kernel methods can be prohibitively expensive for large problems, reliable low-rank matrix approximations and high-performance…
LU factorization for sparse matrices is the most important computing step for many engineering and scientific computing problems such as circuit simulation. But parallelizing LU factorization with the Graphic Processing Units (GPU) still…
Many important real-world applications, such as System Identification with Gaussian Processes, involve solving linear systems with symmetric positive-definite matrices. The iterative CG method and direct solvers based on the Cholesky…
Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…
Our interest lies in the robust and efficient solution of large sparse linear least-squares problems. In recent years, hardware developments have led to a surge in interest in exploiting mixed precision arithmetic within numerical linear…
We propose a solution strategy for linear systems arising in interior method optimization, which is suitable for implementation on hardware accelerators such as graphical processing units (GPUs). The current gold standard for solving these…