Related papers: The N-shaped partition method: A novel parallel im…
In this paper we present a methodology for data accesses when solving batches of Tridiagonal and Pentadiagonal matrices that all share the same left-hand-side (LHS) matrix. The intended application is to the numerical solution of Partial…
This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy…
Edge-centric distributed computations have appeared as a recent technique to improve the shortcomings of think-like-a-vertex algorithms on large scale-free networks. In order to increase parallelism on this model, edge partitioning -…
A novel overlapping domain decomposition splitting algorithm based on a Crank-Nisolson method is developed for the stochastic nonlinear Schroedinger equation driven by a multiplicative noise with non-periodic boundary conditions. The…
Block-tridiagonal systems are prevalent in state estimation and optimal control, and solving these systems is often the computational bottleneck. Improving the underlying solvers therefore has a direct impact on the real-time performance of…
We propose Distributed Neighbor Expansion (Distributed NE), a parallel and distributed graph partitioning method that can scale to trillion-edge graphs while providing high partitioning quality. Distributed NE is based on a new heuristic,…
A tridiagonal matrix algorithm (TDMA), Pipelined-TDMA, is developed for multi-GPU systems to resolve the scalability bottlenecks caused by the sequential structure of conventional divide-and-conquer TDMA. The proposed method pipelines…
This paper focuses on the parallel implementation of a direct $N$-body method~(particle-particle algorithm) and the application of multiple GPUs for galactic dynamics simulations. Application of a hybrid OpenMP-CUDA technology is considered…
This paper applies the N-block PCPM algorithm to solve multi-scale multi-stage stochastic programs, with the application to electricity capacity expansion models. Numerical results show that the proposed simplified N-block PCPM algorithm,…
In this paper, an efficient parallel splitting method is proposed for the optimal control problem with parabolic equation constraints. The linear finite element is used to approximate the state variable and the control variable in spatial…
We are concerned with the fastest possible direct numerical solution algorithm for a thin-banded or tridiagonal linear system of dimension $N$ on a distributed computing network of $N$ nodes that is connected in a binary communication tree.…
In this thesis we develop techniques to efficiently solve numerical Partial Differential Equations (PDEs) using Graphical Processing Units (GPUs). Focus is put on both performance and re--usability of the methods developed, to this end a…
Hypergraph partitioning is a recurring NP-hard problem in engineering; its efficient solution at scale hinges on parallelism. This work proposes a GPU-centric algorithm for multi-level hypergraph partitioning aimed at a specific set of…
This report provides an introduction to algorithms for fundamental linear algebra problems on various parallel computer architectures, with the emphasis on distributed-memory MIMD machines. To illustrate the basic concepts and key issues,…
A linearized numerical scheme is proposed to solve the nonlinear time fractional parabolic problems with time delay. The scheme is based on the standard Galerkin finite element method in the spatial direction, the fractional Crank-Nicolson…
This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm. A time complexity model for the GPU realization of…
This paper introduces a second-order method for solving general elliptic partial differential equations (PDEs) on irregular domains using GPU acceleration, based on Ying's kernel-free boundary integral (KFBI) method. The method addresses…
In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used…
Large-scale parallel numerical simulations are essential for a wide range of engineering problems that involve complex, coupled physical processes interacting across a broad range of spatial and temporal scales. The data structures involved…
Simulations of physical phenomena are essential to the expedient design of precision components in aerospace and other high-tech industries. These phenomena are often described by mathematical models involving partial differential equations…