Related papers: GLU3.0: Fast GPU-based Parallel Sparse LU Factoriz…

GSoFa: Scalable Sparse Symbolic LU Factorization on GPUs

Decomposing matrix A into a lower matrix L and an upper matrix U, which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-11 Anil Gaihre , Xiaoye S. Li , Hang Liu

GPU accelerated matrix factorization of large scale data using block based approach

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

HYLU: Hybrid Parallel Sparse LU Factorization

This article introduces HYLU, a hybrid parallel LU factorization-based general-purpose solver designed for efficiently solving sparse linear systems (Ax=b) on multi-core shared-memory architectures. The key technical feature of HYLU is the…

Hardware Architecture · Computer Science 2026-04-02 Xiaoming Chen

A Two-level GPU-Accelerated Incomplete LU Preconditioner for General Sparse Linear Systems

This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…

Numerical Analysis · Mathematics 2023-03-17 Tianshi Xu , Ruipeng Li , Daniel Osei-Kuffuor

GPU Accelerated Sparse Cholesky Factorization

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-13 M. Ozan Karsavuran , Esmond G. Ng , Barry W. Peyton

A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-05 Zhen Hu , Dongliang Xiong , Kai Huang , Changjun Wu , Xiaowen Jiang

GPU-Accelerated Cholesky Factorization of Block Tridiagonal Matrices

This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…

Optimization and Control · Mathematics 2026-01-08 Roland Schwan , Daniel Kuhn , Colin N. Jones

Randomized LU Decomposition Using Sparse Projections

A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation…

Numerical Analysis · Mathematics 2016-01-19 Yariv Aizenbud , Gil Shabat , Amir Averbuch

An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs

Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Qianxiang Ma , Rio Yokota

Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners

We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-30 Tianyu Liang , Chao Chen , Yotam Yaniv , Hengrui Luo , David Tench , Xiaoye S. Li , Aydin Buluc , James Demmel

sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

Faster and Cheaper: Parallelizing Large-Scale Matrix Factorization on GPUs

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

Basker: A Threaded Sparse LU Factorization Utilizing Hierarchical Parallelism and Data Layouts

Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-22 Joshua Dennis Booth , Sivasankaran Rajamanickam , Heidi K. Thornquist

Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines

Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-25 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarch , Yvonnick Le Menach , Jean-Luc Dekeyser

Analysis of A Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards

We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large;…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-09-29 Ang Li , Radu Serban , Dan Negrut

Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU

The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite…

Numerical Analysis · Mathematics 2021-08-31 Abal-Kassim Cheik Ahamed , Frédéric Magoulès

Fast Parallel Newton-Raphson Power Flow Solver for Large Number of System Calculations with CPU and GPU

To analyze large sets of grid states, e.g. when evaluating the impact from the uncertainties of the renewable generation with probabilistic Monte Carlo simulation or in stationary time series simulation, large number of power flow…

Computational Engineering, Finance, and Science · Computer Science 2021-04-29 Zhenqi Wang , Sebastian Wende-von Berg , Martin Braun

Fast and Green Computing with Graphics Processing Units for solving Sparse Linear Systems

In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithms on GPU: the energy consumption of the GPU. We give an analysis of the performance of linear algebra operations, including addition of…

Numerical Analysis · Mathematics 2021-12-22 Abal-Kassim Cheik Ahamed , Alban Desmaison , Frederic Magoules

Parallel Triangular Solvers on GPU

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new…

Mathematical Software · Computer Science 2016-06-03 Zhangxin Chen , Hui Liu , Bo Yang

An Efficient, Sparsity-Preserving, Online Algorithm for Low-Rank Approximation

Low-rank matrix approximation is a fundamental tool in data analysis for processing large datasets, reducing noise, and finding important signals. In this work, we present a novel truncated LU factorization called Spectrum-Revealing LU…

Numerical Analysis · Computer Science 2017-08-21 David G. Anderson , Ming Gu