English
Related papers

Related papers: GLU3.0: Fast GPU-based Parallel Sparse LU Factoriz…

200 papers

Decomposing matrix A into a lower matrix L and an upper matrix U, which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-11 Anil Gaihre , Xiaoye S. Li , Hang Liu

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

This article introduces HYLU, a hybrid parallel LU factorization-based general-purpose solver designed for efficiently solving sparse linear systems (Ax=b) on multi-core shared-memory architectures. The key technical feature of HYLU is the…

Hardware Architecture · Computer Science 2026-04-02 Xiaoming Chen

This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…

Numerical Analysis · Mathematics 2023-03-17 Tianshi Xu , Ruipeng Li , Daniel Osei-Kuffuor

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-13 M. Ozan Karsavuran , Esmond G. Ng , Barry W. Peyton

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-05 Zhen Hu , Dongliang Xiong , Kai Huang , Changjun Wu , Xiaowen Jiang

This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…

Optimization and Control · Mathematics 2026-01-08 Roland Schwan , Daniel Kuhn , Colin N. Jones

A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation…

Numerical Analysis · Mathematics 2016-01-19 Yariv Aizenbud , Gil Shabat , Amir Averbuch

Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-05 Qianxiang Ma , Rio Yokota

We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-30 Tianyu Liang , Chao Chen , Yotam Yaniv , Hengrui Luo , David Tench , Xiaoye S. Li , Aydin Buluc , James Demmel

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

Scalable sparse LU factorization is critical for efficient numerical simulation of circuits and electrical power grids. In this work, we present a new scalable sparse direct solver called Basker. Basker introduces a new algorithm to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-22 Joshua Dennis Booth , Sivasankaran Rajamanickam , Heidi K. Thornquist

Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-25 Antonio Wendell De Oliveira Rodrigues , Frédéric Guyomarch , Yvonnick Le Menach , Jean-Luc Dekeyser

We discuss an approach for solving sparse or dense banded linear systems ${\bf A} {\bf x} = {\bf b}$ on a Graphics Processing Unit (GPU) card. The matrix ${\bf A} \in {\mathbb{R}}^{N \times N}$ is possibly nonsymmetric and moderately large;…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-09-29 Ang Li , Radu Serban , Dan Negrut

The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite…

Numerical Analysis · Mathematics 2021-08-31 Abal-Kassim Cheik Ahamed , Frédéric Magoulès

To analyze large sets of grid states, e.g. when evaluating the impact from the uncertainties of the renewable generation with probabilistic Monte Carlo simulation or in stationary time series simulation, large number of power flow…

Computational Engineering, Finance, and Science · Computer Science 2021-04-29 Zhenqi Wang , Sebastian Wende-von Berg , Martin Braun

In this paper, we aim to introduce a new perspective when comparing highly parallelized algorithms on GPU: the energy consumption of the GPU. We give an analysis of the performance of linear algebra operations, including addition of…

Numerical Analysis · Mathematics 2021-12-22 Abal-Kassim Cheik Ahamed , Alban Desmaison , Frederic Magoules

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new…

Mathematical Software · Computer Science 2016-06-03 Zhangxin Chen , Hui Liu , Bo Yang

Low-rank matrix approximation is a fundamental tool in data analysis for processing large datasets, reducing noise, and finding important signals. In this work, we present a novel truncated LU factorization called Spectrum-Revealing LU…

Numerical Analysis · Computer Science 2017-08-21 David G. Anderson , Ming Gu
‹ Prev 1 2 3 10 Next ›