Related papers: Efficient parallelization strategy for real-time F…

Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners

We introduce a parallel algorithm to construct a preconditioner for solving a large, sparse linear system where the coefficient matrix is a Laplacian matrix (a.k.a., graph Laplacian). Such a linear system arises from applications such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-30 Tianyu Liang , Chao Chen , Yotam Yaniv , Hengrui Luo , David Tench , Xiaoye S. Li , Aydin Buluc , James Demmel

Neural Acceleration of Incomplete Cholesky Preconditioners

The solution of a sparse system of linear equations is ubiquitous in scientific applications. Iterative methods, such as the Preconditioned Conjugate Gradient method (PCG), are normally chosen over direct methods due to memory and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-04 Joshua Dennis Booth , Hongyang Sun , Trevor Garnett

GPU Accelerated Sparse Cholesky Factorization

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. We will solve the linear systems using a direct method, in which a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-13 M. Ozan Karsavuran , Esmond G. Ng , Barry W. Peyton

GPU-Accelerated Cholesky Factorization of Block Tridiagonal Matrices

This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…

Optimization and Control · Mathematics 2026-01-08 Roland Schwan , Daniel Kuhn , Colin N. Jones

sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

This paper introduces sTiles, a GPU-accelerated framework for factorizing sparse structured symmetric matrices. By leveraging tile algorithms for fine-grained computations, sTiles uses a structure-aware task execution flow to handle…

Performance · Computer Science 2025-01-07 Esmail Abdul Fattah , Hatem Ltaief , Havard Rue , David Keyes

Gaussian Process Models with Parallelization and GPU acceleration

In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-21 Zhenwen Dai , Andreas Damianou , James Hensman , Neil Lawrence

Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL

Many important real-world applications, such as System Identification with Gaussian Processes, involve solving linear systems with symmetric positive-definite matrices. The iterative CG method and direct solvers based on the Cholesky…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-14 Tim Thüring , Alexander Strack , Dirk Pflüger

On the performance of various parallel GMRES implementations on CPU and GPU clusters

As the need for computational power and efficiency rises, parallel systems become increasingly popular among various scientific fields. While multiple core-based architectures have been the center of attention for many years, the rapid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-11 E. I. Ioannidis , N. Cheimarios , A. N. Spyropoulos , A. G. Boudouvis

A Two-level GPU-Accelerated Incomplete LU Preconditioner for General Sparse Linear Systems

This paper presents a parallel preconditioning approach based on incomplete LU (ILU) factorizations in the framework of Domain Decomposition (DD) for general sparse linear systems. We focus on distributed memory parallel architectures,…

Numerical Analysis · Mathematics 2023-03-17 Tianshi Xu , Ruipeng Li , Daniel Osei-Kuffuor

Improvement Cache Efficiency of Explicit Finite Element Procedure and its Application to Parallel Casting Solidification Simulation

A simple method for improving cache efficiency of serial and parallel explicit finite procedure with application to casting solidification simulation over three-dimensional complex geometries is presented. The method is based on division of…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-05-19 Ruhollah Tavakoli

Fast But Accurate: A Real-Time Hyperelastic Simulator with Robust Frictional Contact

We present a GPU-friendly framework for real-time implicit simulation of elastic material in the presence of frictional contacts. The integration of hyperelasticity, non-interpenetration contact, and friction in real-time simulations…

Graphics · Computer Science 2025-03-20 Ziqiu Zeng , Siyuan Luo , Fan Shi , Zhongkai Zhang

Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework

Simulating large-scale microswimmer dynamics in viscous fluid poses significant challenges due to the coupled high spatial and temporal complexity. Conventional high-performance computing (HPC) methods often address these two dimensions in…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Ruixiang Huang , Weifan Liu

A parallel solver for a preconditioned space-time boundary element method for the heat equation

We describe a parallel solver for the discretized weakly singular space-time boundary integral equation of the spatially two-dimensional heat equation. The global space-time nature of the system matrices leads to improved parallel…

Numerical Analysis · Mathematics 2021-02-23 Stefan Dohr , Michal Merta , Günther Of , Olaf Steinbach , Jan Zapletal

Global finite element matrix construction based on a CPU-GPU implementation

The finite element method (FEM) has several computational steps to numerically solve a particular problem, to which many efforts have been directed to accelerate the solution stage of the linear system of equations. However, the finite…

Numerical Analysis · Computer Science 2015-01-21 Francisco Javier Ramírez-Gil , Marcos de Sales Guerra Tsuzuki , Wilfredo Montealegre-Rubio

GPU-based Parallel Computation Support for Stan

This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other matrix algebra primitives and some…

Mathematical Software · Computer Science 2020-05-19 Rok Češnovar , Steve Bronder , Davor Sluga , Jure Demšar , Tadej Ciglarič , Sean Talts , Erik Štrumbelj

End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations

In this paper, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems. The methods described here allow for the construction of…

Mathematical Software · Computer Science 2023-06-05 Will Pazner , Tzanio Kolev , Jean-Sylvain Camier

Tuning Spectral Element Preconditioners for Parallel Scalability on GPUs

The Poisson pressure solve resulting from the spectral element discretization of the incompressible Navier-Stokes equation requires fast, robust, and scalable preconditioning. In the current work, a parallel scaling study of…

Numerical Analysis · Mathematics 2021-12-14 Malachi Phillips , Stefan Kerkemeier , Paul Fischer

GPU Accelerated Finite Element Assembly with Runtime Compilation

In recent years, high performance scientific computing on graphics processing units (GPUs) have gained widespread acceptance. These devices are designed to offer massively parallel threads for running code with general purpose. There are…

Mathematical Software · Computer Science 2018-02-13 Tao Cui , Xiaohu Guo , Hui Liu

Analysis of heterogeneous computing approaches to simulating heat transfer in heterogeneous material

The simulation of heat flow through heterogeneous material is important for the design of structural and electronic components. Classical analytical solutions to the heat equation PDE are not known for many such domains, even those having…

Numerical Analysis · Mathematics 2019-05-21 Andrew Loeb , Christopher Earls

Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU

The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite…

Numerical Analysis · Mathematics 2021-08-31 Abal-Kassim Cheik Ahamed , Frédéric Magoulès